What is the process of using ElasticSearch 07/12 Update SLTechnology News&Howtos

What is the process of using ElasticSearch

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

ElasticSearch use process is how, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain for you in detail, people with this need can come to learn, I hope you can gain something.

Here is an introduction to the necessary knowledge of ElasticSearch: from getting started, index management to detailed mapping.

1. Getting started

1. Check the health status of the cluster

Http://localhost:9200/_cat

Http://localhost:9200/_cat/health?v

Note: v is used to require a header to be returned in the result

Status value description

Green-everything is good (cluster is fully functional), that is, best state

Yellow-all data is available but some replicas are not yet allocated (cluster is fully functional), that is, data and clusters are available, but some cluster backups are bad

Red-some data is not available for whatever reason (cluster is partially functional), that is, neither data nor cluster is available

View the nodes of the cluster

Http://localhost:9200/_cat/?v

two。 View all indexes

3. Create an index

Create an index named customer. Pretty requires that a beautiful json result be returned

PUT / customer?pretty

Check all the indexes again.

GET / _ cat/indices?v

4. Index a document into the customer index

Curl-X PUT "localhost:9200/customer/_doc/1?pretty"-H 'Content-Type: application/json'-d' {"name": "John Doe"}'

5. Get the document of the specified id from the customer index

Curl-X GET "localhost:9200/customer/_doc/1?pretty"

6. Query all documents

GET / customer/_search?q=*&sort=name:asc&pretty

JSON format

GET / customer/_search {"query": {"match_all": {}}, "sort": [{"name": "asc"}]}

II. Index management

1. Create an index

Create an index named twitter, set the number of shards to 3 and the number of backups to 2. Note: creating an index in ES is similar to creating a database in a database (similar to creating a table after ES6.0)

PUT twitter {"settings": {"index": {"number_of_shards": 3, "number_of_replicas": 2}

Description:

The default number of slices is 5 to 1024

Default number of backups is 1

The name of the index must be lowercase and cannot be repeated

Create the result:

The created command can also be abbreviated to

PUT twitter {"settings": {"number_of_shards": 3, "number_of_replicas": 2}}

two。 Create mapping Mappin

Note: creating a mapping map in ES is similar to defining a table structure in a database, that is, what fields are in the table, what types of fields are, the default values of fields, etc., and is also similar to the definition of schema schema in solr.

PUT twitter {"settings": {"index": {"number_of_shards": 3, "number_of_replicas": 2}} "mappings": {"type1": {"properties": {"field1": {"type": "text"}}

3. Add an alias definition when creating an index

PUT twitter {"aliases": {"alias_1": {}, "alias_2": {"filter": {"term": {"user": "kimchy"}}, "routing": "kimchy"}

4. Description of the result returned when the index is created

5. Get Index to view the definition information of the index

GET / twitter, you can get more than one index at a time (separated by commas) all indexes _ all or use the wildcard character *

GET / twitter/_settings

GET / twitter/_mapping

6. Delete index

DELETE / twitter

Description:

You can delete more than one index at a time (at comma intervals) to delete all indexes _ all or wildcard *

7. Determine whether the index exists

HEAD twitter

HTTP status code indicates that the result 404 does not exist.

8. Modify the settings information of the index

The setting information of index is divided into two parts: static information and dynamic information. Static information cannot be changed, such as the number of fragments in the index. Dynamic information can be modified.

REST access endpoints:

/ _ settings updates all indexes.

{index} / _ settings updates the settings of one or more indexes.

For detailed configuration items, please refer to: https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#index-modules-settings

9. Modify the number of backups

PUT / twitter/_settings {"index": {"number_of_replicas": 2}}

10. Set back to the default value, using null

PUT / twitter/_settings {"index": {"refresh_interval": null}}

11. Set the read and write of the index

Index.blocks.read_only: set to true, the index and its metadata are readable only index.blocks.read_only_allow_delete: set to true, and delete is allowed when read-only. Index.blocks.read: if set to true, it is not readable. Index.blocks.write: if set to true, it is not writable. Index.blocks.metadata: set to true, the index metadata is not readable or writable.

twelve。 Index template

When creating an index, writing definition information for each index can be a tedious task. ES provides an index template function that allows you to define an index template in which settings, mapping, and a schema definition are defined to match the created index.

Note: the template is referenced only when the index is created. Modifying the template will not affect the index already created.

12.1 add / modify a template named tempae_1 to match index creation with the name te* or bar*:

PUT _ template/template_1 {"index_patterns": ["te*", "bar*"], "settings": {"number_of_shards": 1}, "mappings": {"type1": {"_ source": {"enabled": false} "properties": {"host_name": {"type": "keyword"}, "created_at": {"type": "date", "format": "EEE MMM dd HH:mm:ss Z YYYY"}

12.2 View Index template

GET / _ template/template_1 GET / _ template/temp* GET / _ template/template_1,template_2 GET / _ template

12.3 Delete template

DELETE / _ template/template_1

13. Open/Close Index turns the index on / off

POST / my_index/_close POST / my_index/_open

Description:

Closed indexes cannot be read or written and take up almost no cluster overhead.

Closed indexes can be opened, following the normal recovery process.

14. Shrink Index shrinking index

The number of fragments in the index is immutable. If you want to reduce the number of fragments, you can shrink it to a new index. The shard number of the new index must be the factor value of the original shard number. If the original shard number is 8, the shard number of the new index can be 4, 2, 1.

When do I need to shrink the index?

When the index was first created, the number of shards was set too large. Later, it was found that you could not use so many shards, so you need to shrink at this time.

The process of contraction:

First, transfer all the main shards to one host.

Create a new index on this host with a small number of fragments and other settings consistent with the original index

Copy (or hard link) all the fragments of the original index to the directory of the new index

Open the new index to recover the fragmented data

(optional) reequalize the shards of the new index to other nodes.

Preparatory work before contraction:

Set the original index to read-only

Redistribute a copy of each shard of the original index to the same node and if it is healthy and green.

PUT / my_source_index/_settings {"settings": {"index.routing.allocation.require._name": "shrink_node_name", "index.blocks.write": true}}

To contract:

POST my_source_index/_shrink/my_target_index {"settings": {"index.number_of_replicas": 1, "index.number_of_shards": 1, "index.codec": "best_compression"}}

Monitor the shrinkage process:

GET _ cat/recovery?v GET _ cluster/health

15. Split Index split index

When the fragmentation capacity of the index is too large, the index can be split into a new index with multiple fragments. How many times it can be split is determined by the number of index.number_of_routing_shards route shards specified when the index is created. The number of route shards determines the hash space in which documents are routed to shards according to consistent hash.

If index.number_of_routing_shards = 30 and the specified number of shards is 5, you can split it in the following multiples:

5 → 10 → 30 (split by 2, then by 3) 5 → 15 → 30 (split by 3, then by 2) 5 → 30 (split by 6)

Why do I need a split index?

It is necessary to split the index when the number of fragments of the initially set index is not enough, as opposed to the compressed index.

Note: only indexes with index.number_of_routing_shards specified at the time of creation can be split, and ES7 will no longer have this restriction at the beginning.

The difference between solr and solr is that solr splits one shard, while in es the entire index is split.

Split steps:

Prepare an index to split:

PUT my_source_index {"settings": {"index.number_of_shards": 1, "index.number_of_routing_shards": 2}}

Set the index read-only first:

PUT / my_source_index/_settings {"settings": {"index.blocks.write": true}}

Do the split:

POST my_source_index/_split/my_target_index {"settings": {"index.number_of_shards": 2}}

Monitor the split process:

GET _ cat/recovery?v GET _ cluster/health

16. Rollover Index aliases scroll to the newly created index

For timely index data, such as logs, after a certain period of time, the old index data is useless. Just like creating tables according to time in the database to store data in different periods, we can also store data in different periods separately by building multiple indexes in ES. What is more convenient than in the database is that ES can scroll to the latest index through aliases, so that you can always manipulate the latest index when you operate through aliases.

ES's rollover index API allows us to create a new index based on the specified conditions (time, number of documents, index size) and scroll the alias to the new index.

Note: at this time, the alias can only be an alias for an index.

Rollover Index example:

Create an index with the name logs-0000001 and alias logs_write:

PUT / logs-000001 {"aliases": {"logs_write": {}

Add 1000 documents to the index logs-000001, and then set conditions for alias scrolling

POST / logs_write/_rollover {"conditions": {"max_age": "7d", "max_docs": 1000, "max_size": "5gb"}}

Description:

If the alias logs_write points to an index created 7 days ago (inclusive) or the number of documents of the index > = 1000 or the size of the index > = 5gb, a new index logs-000002 is created and the alias logs_writer points to the newly created logs-000002 index

Naming rules for Rollover Index new indexes:

If the name of the index is-a numeric end, such as logs-000001, the name of the new index will also be this pattern, with a value incremented by 1.

If the name of the index is not a-numeric end, you need to specify the name of the new index when requesting rollover api

POST / my_alias/_rollover/my_new_index_name {"conditions": {"max_age": "7d", "max_docs": 1000, "max_size": "5gb"}}

Use Date math (time expression) in the name

If you want to generate an index name with a date, such as logstash-2016.02.03-1, you can use a time expression to name it when you create the index:

# PUT / with URI encoding: PUT /% 3Clogs-%7Bnow%2Fd%7D-1%3E {"aliases": {"logs_write": {}} PUT logs_write/_doc/1 {"message": "a dummy log"} POST logs_write/_refresh # Wait for a day to pass POST / logs_write/_rollover {"conditions": {"max_docs": "1"}}

The new index can be defined when Rollover:

PUT / logs-000001 {"aliases": {"logs_write": {}} POST / logs_write/_rollover {"conditions": {"max_age": "7d", "max_docs": 1000, "max_size": "5gb"}, "settings": {"index.number_of_shards": 2}}

Test whether the condition is met before the actual operation of Dry run:

POST / logs_write/_rollover?dry_run {"conditions": {"max_age": "7d", "max_docs": 1000, "max_size": "5gb"}}

Description:

The test does not create an index, but only checks whether the condition is met

Note: rollover is operated only when you request it, not automatically in the background. You can ask it periodically.

17. Index monitoring

17.1 View index status information

View all index status:

GET / _ stats

View the status information of the specified index:

GET / index1,index2/_stats

17.2 View index segment information

GET / test/_segments GET / index1,index2/_segments GET / _ segments

17.3 View index recovery information

GET index1,index2/_recovery?human

GET / _ recovery?human

17.4 View the storage information of index fragments

# return information of only index test GET / test/_shard_stores # return information of only test1 and test2 indices GET / test1,test2/_shard_stores # return information of all indices GET / _ shard_stores GET / _ shard_stores?status=green

18. Index state management

18.1 Clear Cache cleans the cache

POST / twitter/_cache/clear

All caches are cleaned by default. You can specify to clean query and fielddata or request caches.

POST / kimchy,elasticsearch/_cache/clear POST / _ cache/clear

18.2 Refresh, reopen the read index

POST / kimchy,elasticsearch/_refresh POST / _ refresh

18.3 Flush to flush index data cached in memory to persistent storage

POST twitter/_flush

18.4 Force merge forced segment merging

POST / kimchy/_forcemerge?only_expunge_deletes=false&max_num_segments=100&flush=true

Optional parameter description:

Max_num_segments is merged into several segments. Default is 1.

Whether only_expunge_deletes merges only segments containing deleted documents. Default is false.

Whether to refresh flush after merging. Default is true.

POST / kimchy,elasticsearch/_forcemerge POST / _ forcemerge

Third, detailed explanation of mapping

1. What is the Mapping mapping?

Mapping defines what fields are in the index, the type of fields, and other structural information. Equivalent to the table structure definition in the database, or schema in solr. Because lucene needs to know how to index the fields that store the document when indexing the document.

Manual mapping and dynamic mapping are supported in ES.

1.1. Create a mapping for the index

PUT test {"mappings": {"type1": {"properties": {"field1": {"type": "text"}}

Description: mapping definition can be modified later

two。 Mapping Category Mapping type repeal description

The first design of ES is to use indexes to compare relational databases and mapping type to compare tables. An index can contain multiple mapping categories. There is a serious problem with this analogy, that is, when there are same name fields in multiple mapping type (especially the same name fields or different types), it is difficult to deal with in one index, because there is only index-document structure in the search engine, and the data of different mapping categories is a document (only contains different fields).

Starting from 6.0.0, the qualification contains only one mapping class definition ("index.mapping.single_type": true), which is compatible with multiple mapping classes in 5.x. The mapping category will be removed from 7.0.

To match future plans, now define the unique mapping category name as "_ doc", because the request address for the index will be specified as: PUT {index} / _ doc/ {id} and POST {index} / _ doc

Example of Mapping mapping:

PUT twitter {"mappings": {"_ doc": {"properties": {"type": {"type": "keyword"}, "name": {"type": "text"}, "user_name": {"type": "keyword"}, "email": {"type": "keyword"} "content": {"type": "text"}, "tweeted_at": {"type": "date"}

Multi-mapped category data is dumped to a separate index:

ES provided reindex API to do this.

3. Field type datatypes

The field type defines how to index and store field values. A wealth of field type definitions are provided in ES. Please see the official website link to learn more about the characteristics of each type:

Https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html

3.1 Core Datatypes Core Typ

String text and keyword Numeric datatypes long, integer, short, byte, double, float, half_float, scaled_float Date datatype date Boolean datatype boolean Binary datatype binary Range datatypes range integer_range, float_range, long_range, double_range, date_range

3.2 Complex datatypes compound type

The Array datatype array is multi-valued and does not require a special type Object datatype object: indicates that the value is a JSON object Nested datatype nested:for arrays of JSON objects (represents a JSON object array)

3.3Geographic data types of Geo datatypes

Geo-point datatype geo_point:for lat/lon points (latitude and longitude) Geo-Shape datatype geo_shape:for complex shapes like polygons (shape representation)

3.4 Special types of Specialised datatypes

IP datatype ip:for IPv4 and IPv6 addresses Completion datatype completion:to provide auto-complete suggestions Token count datatype token_count:to count the number of tokens in a string mapper-murmur3 murmur3:to compute hashes of values at index-time and store them in the index Percolator type Accepts queries from the query-dsl join datatype Defines parent/child relation for documents within the same index

4. Introduction to field definition properties

The type (Datatype) of the field defines how the field value is indexed and stored, and there are some properties that allow us to override the default value or make special definitions as needed.

Analyzer specify word splitter normalizer specify standardizer boost specify weight value coerce cast copy_to value copy to another field doc_values whether the docValues dynamic enabled field can use fielddata eager_global_ordinals format to specify the format ignore_above ignore_malformed index_options index fields norms of the time value Null_value position_increment_gap properties search_analyzer similarity store term_vector

Field definition Properties-exampl

PUT my_index {"mappings": {"_ doc": {"properties": {"date": {"type": "date", "format": "yyyy-MM-dd HH:mm:ss | | yyyy-MM-dd | | epoch_millis"}

5. Multi Field multiple fields

When we need to index a field in many different ways, we can use the fields multiple field definition. For example, a string field needs not only text participle indexing, but also keyword keyword indexing to support sorting and aggregation, or different participle indexing.

Example:

Define multiple fields:

Description: raw is a multiple version of the name (custom)

PUT my_index {"mappings": {"_ doc": {"properties": {"city": {"type": "text" "fields": {"raw": {"type": "keyword"}

Add a document to multiple fields

PUT my_index/_doc/1 {"city": "New York"} PUT my_index/_doc/2 {"city": "York"}

Get the value of multiple fields:

GET my_index/_search {"query": {"city": "york"}}, "sort": {"city.raw": "asc"}, "aggs": {"Cities": {"terms": {"field": "city.raw"}

6. Meta field

Official website link:

Meta fields are document fields defined in ES and have the following categories:

7. Dynamic mapping

Dynamic mapping: an important feature provided in ES allows us to use ES quickly without having to create indexes and define mappings first. For example, we submit the document directly to ES for indexing:

PUT data/_doc/1 {"count": 5}

ES will automatically create the data index, _ doc mapping, field count of type long for us

When indexing the document, when there are new fields, ES will define the automatically added fields to the json for us according to the data type of the mapping of our fields.

7.1 Field dynamic Mapping rules

7.2 Date detection time detection

The so-called time detection means that when we insert data into the ES, we will automatically check whether our data is in a date format, and if so, it will automatically convert it to the set format for us.

Date_detection is enabled by default, and the default format dynamic_date_formats is:

["strict_date_optional_time", "yyyy/MM/dd HH:mm:ss Z | | yyyy/MM/dd Z"] PUT my_index/_doc/1 {"create_date": "2015-09-02"} GET my_index/_mapping

Custom time format:

PUT my_index {"mappings": {"_ doc": {"dynamic_date_formats": ["MM/dd/yyyy"]}

Disable time detection:

PUT my_index {"mappings": {"_ doc": {"date_detection": false}

7.3 Numeric detection numerical detection

Turn on numeric detection (disabled by default)

PUT my_index {"mappings": {"_ doc": {"numeric_detection": true}} PUT my_index/_doc/1 {"my_float": "1.0", "my_integer": "1"} is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.