[ELK] 03, ES Cluster and ES query 07/01 Update SLTechnology News&Howtos

[ELK] 03, ES Cluster and ES query

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

The previous article mainly learned about the installation of ES and its plug-ins, and this one mainly learned about ES cluster and its node management.

I. Overview of ES clusters

1. Brief introduction of ES cluster

ES is created for high availability and scalability, and server expansion can be done by purchasing more powerful servers (vertical scale or up scale, Vertical Scale/Scaling Up) or by purchasing more servers (scale out or scale out, Horizontal Scale/Scaling Out). Although ES can take advantage of more powerful hardware, vertical scaling has its limits. Real scalability comes from horizontal scaling-distributing the load and increasing reliability by adding more nodes to the cluster.

In most databases, horizontal scaling usually requires you to do a big refactoring of the application to take advantage of more nodes.

In contrast, ES is inherently distributed: it knows how to manage multiple nodes to scale and achieve high availability. It also means that your app doesn't need to care about it.

2. The master node of the ES cluster

A node in the cluster is selected as the master node (Master Node), which is responsible for managing changes throughout the cluster, such as creating or deleting an index (Index), adding or deleting nodes to the cluster. The master node does not need to participate in document-level changes or searches, which means that although there is only one master node, it does not become a bottleneck as traffic increases. Any node can be the master node. In our example, there is only one node, so it takes on the function of the primary node.

For users, you can communicate with any node in the cluster, including the primary node. Each node knows where each document is stored and can forward the request to the node that holds the required data. The node in which the user communicates is responsible for collecting the required data from each node and returning it to the user. The whole process is managed transparently by ES.

3. Characteristics of ES cluster

Once the elasticsearch cluster is established, a master is elected, and the rest are slave nodes.

But during the specific operation, each node provides write and read operations. That is, no matter which node you write to, the data will be allocated to all nodes on the cluster.

There is a case where a node is down. If the slave node is dead, then the first concern is, will the data be lost? No, I won't. If you turn on replicate, the data must be backed up on another machine.

Backup shards on other nodes are automatically upgraded to the primary shard of this shard data. It is important to note that there will be a short period of yellow status time here.

What if the primary node dies? When the slave nodes find that they can no longer connect with the master node, they will decide to elect another master node.

But there is a problem of brain fissure. Suppose there are five machines, three in one computer room and two in the other. When the connection between the two computer rooms is broken, the nodes in each computer room will meet on their own and elect a master node.

At this time, there are two master nodes, and when the connection between the computer rooms is restored, there will be a data conflict.

The solution is to set parameters:

Discovery.zen.minimum_master_nodes

If it is 3 (more than half of the nodes), when the connection between the two data centers is broken, the master of the data center greater than or equal to 3 will be dominant, and the node of the other data center will be out of service.

For the self-discovered kinetic energy, it is not difficult to see that if the node is directly exposed to the outside, no matter how to switch master, there must be a single-node problem. So we usually add a load balancer in front of the nodes that can provide services.

4. Automatic discovery function of ES cluster

Elasticsearch clusters are built-in auto-discovery.

It means that you only need to configure the cluster name and node name in each node, and the nodes that communicate with each other will find the nodes on the network configured in the same cluster according to the es custom service discovery protocol.

Like other service discovery features, es supports both multicast and unicast. Multicast and unicast are configured according to these parameters:

Discovery.zen.ping.multicast.enabled: false# this setting turns off the auto-discovery of multicast in order to prevent nodes on other machines from automatically connecting. Discovery.zen.fd.ping_timeout: 100sdiscovery.zen.ping.timeout: 10 sets the ping duration of the connection between nodes sdiscovery.zen.minimum_master_nodes: crack this setting to avoid brain fissure. For example, if a cluster with 3 nodes is set to 2, it will not automatically become a masterdiscovery.zen.ping.unicast.hosts when a node is detached: ["12.12.12.12 10801"] # this node is set to automatically discover.

Multicast depends on whether the server supports it. Because of its security, basic cloud services (such as Aliyun) do not support multicast, so even if you enable multicast mode, you can only find nodes on the local machine.

Unicast mode is safe and efficient, but the disadvantage is that if a new machine is added, it will need to be configured on each node to take effect.

II. Management of ES clusters

ES cluster provides Restful-style access interface API

ES access interface: 9200/tcp works based on http protocol

1. Restful style API

Four types of API:

Check the health of clusters, nodes, indexes, etc., and get their corresponding status

Manage clusters, nodes, indexes, and metadata

Perform CRUD operation

Perform advanced operations, such as paging,filtering, etc.

Syntax format:

Curl-X': / / HOST:PORT/?'-d''

VERB:

GET,PUT,DELETE et al. (GET method can be omitted)

PROTOCOL:

Http,https

QUERY_STRING:

Query parameters, such as:? pretty indicates output in an easy-to-read JSON format

BODY:

The subject of the request

Check to see if the ES node is working properly:

[root@Node2 ~] # curl localhost:9200 {# JSON format "name": "node2", "cluster_name": "RK", "cluster_uuid": "pwffDjOKQT6Ss2CRQLXt0g", "version": {"number": "5.3.0", "build_hash": "3adb13b" "build_date": "2017-03-23T03:31:50.652Z", "build_snapshot": false, "lucene_version": "6.4.1"}, "tagline": "You Know, for Search" # normal}

Elasticsearch uses JSON as the data format for communication, which is friendly to developers because many programs support the JSON format. For example, js will not talk about it. Java also has fastjson,ruby and other things that come with json.

1) _ cat API

There is a lot of information in Elasticsearch, so it is difficult to find the relationship between complex data with the naked eye. Therefore, the cat command arises at the historic moment, which helps developers to quickly query the relevant information about Elasticsearch.

View all actions of _ cat api:

[root@Node2 ~] # curl 192.168.10.7 cat/allocation/_cat/shards/_cat/shards/ 9200 Cat = ^. ^ = / _ cat/allocation/_cat/shards/_cat/shards/ {index} / _ cat/master/_cat/nodes/_cat/tasks/_cat/indices/_cat/indices/ {index} / _ cat/segments/_cat/segments/ {index} / _ cat/count/_cat/count/ {index} / _ cat/recovery/_cat/recovery/ {index} / _ cat/health/_cat / pending_tasks/_cat/aliases/_cat/aliases/ {alias} / _ cat/thread_pool/_cat/thread_pool/ {thread_pools} / _ cat/plugins/_cat/fielddata/_cat/fielddata/ {fields} / _ cat/nodeattrs/_cat/repositories/_cat/snapshots/ {repository} / _ cat/templatesverbose

Each command supports the use of the? v parameter to display detailed information:

Help

Each command supports the use of the help parameter to output columns that can be displayed:

With the h parameter, you can specify the fields to output:

$curl localhost:9200/_cat/master?vid host ip nodeQG6QrX32QSi8C3-xQmrSoA 127.0.0.1 127.0.0.1 Manslaughter$ curl localhost:9200/_cat/master?h=host,ip Node127.0.0.1 127.0.0.1 Manslaughter [root@Node2 ~] # curl 192.168.10.2:9200/_cat/master?vid host ip nodebMAYDjb2Rsyfpn92Lnax3w 192.168.2.116 192.168.2.116 node7 [root@Node2 ~] # curl 192.168.10.2:9200/_cat/master?helpid | | node id host | h | host name ip | | ip address Node | n | node name [root@Node2 ~] # curl 192.168.10.2:9200/_cat/master?h=host Formatting of id,host192.168.2.116 bMAYDjb2Rsyfpn92Lnax3w numeric types

Many commands support returning large and small numbers that are readable, such as using mb or kb.

$curl localhost:9200/_cat/indices?vhealth status index pri rep docs.count docs.deleted store.size pri.store.sizeyellow open test 5 1 3 0 9.kb 9.kb

Example:

View the slicing information of the ES cluster:

[root@Node2 ~] # curl 192.168.10.7:9200/_cat/allocation 9 38.8mb 4.7gb 13gb 17.7gb 26 192.168.2.116 192.168.2.116 node79 38.8mb 9.1gb 8.6gb 17.7gb 51 192.168.2.114 192.168.2.114 node2 [root@Node2 ~] # curl 192.168.10.7:9200/_cat/allocation?v #? v shows details ( Field name) shards disk.indices disk.used disk.avail disk.total disk.percent host ip node 9 38.8mb 9.1gb 8.6gb 17.7gb 51 192.168.2.114 192.168.114 node2 9 38.8mb 4.7gb 13gb 17.7gb 26 192.168.2.116 192.168.2.116 node7

View the ES cluster node information:

[root@Node2] # curl 192.168.10.7:9200/_cat/nodes?vip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name192.168.2.114 10 96 1 0.00 0.01 0.00 mdi-node2192.168.2.116 3 100 0 0.00 0.00 0.00 mdi * node7

Heap.percent: percentage of heap memory to memory

Cpu: indicates the cpu core used

Node.role: represents the role that a node can play as a master and a data node

Master: indicates whether the node is currently primary, and * indicates whether it is currently primary.

2) _ Cluster API

Cluster-related api interfaces

[root@Node2 ~] # curl localhost:9200/_cluster/health {"cluster_name": "RK", "status": "green", "timed_out": false, "number_of_nodes": 2, "number_of_data_nodes": 2, "active_primary_shards": 9, "active_shards": 18, "relocating_shards": 0, "initializing_shards": 0, "unassigned_shards": 0, "delayed_unassigned_shards": 0, "number_of_pending_tasks": 0 "number_of_in_flight_fetch": 0, "task_max_waiting_in_queue_millis": 0, "active_shards_percent_as_number": 100.0} [root@Node2 ~] # curl localhost:9200/_cluster/health?pretty #? pretty JSON format display Easy to read {"cluster_name": "RK", "status": "green", "timed_out": false, "number_of_nodes": 2, "number_of_data_nodes": 2, "active_primary_shards": 9, "active_shards": 18, "relocating_shards": 0, "initializing_shards": 0, "unassigned_shards": 0, "delayed_unassigned_shards": 0 "number_of_pending_tasks": 0, "number_of_in_flight_fetch": 0, "task_max_waiting_in_queue_millis": 0, "active_shards_percent_as_number": 100.0} the most important line of status information output is status. Many open source ES monitoring scripts actually use this line of data to make alarm judgments. There are three possible values for status:

Green green light, all shards are running correctly, and the cluster is very healthy.

Yellow yellow light, all main shards are running correctly, but there are replica shards missing. This situation means that ES is currently running normally, but there are some risks. Note that in the server startup logic of Kibana4, Kibana4 refuses to start even if the light is yellow, and the loop waits for the cluster status to turn green before it can continue.

Red red light, main fragment missing. This part of the data is completely unavailable. Considering that ES is a simple residual algorithm on the write side, the data on this shard will continue to be written to the error.

Readers who are familiar with Nagios can directly correspond this red, yellow and green light to Critical,Warning,OK in the Nagios system.

Other data interpretation

The total number of nodes in the number_of_nodes cluster.

The total number of data nodes in the number_of_data_nodes cluster.

The total number of primary shards for all indexes in the active_primary_shards cluster.

The total number of shards for all indexes in the active_shards cluster.

The number of slices that relocating_shards is migrating.

The number of slices that initializing_shards is initializing.

The number of shards on which the unassigned_shards is not assigned to a specific node.

Delayed_unassigned_shards delays the number of shards to be allocated to a specific node.

Obviously, the last four items should generally be 0 under normal circumstances. But if there is a long-term non-zero situation, how can we know which index is affected by these long-term unassign or initialize fragmentation? There are more interfaces to get relevant information later in this book. However, at the level of cluster health, you can get a little more detailed content.

When the level request parameter API requests, you can append a level parameter to specify whether the output information is displayed at the indices or shards level. Of course, there are three levels: cluster,indices,shards (default output cluster level if you make a mistake)

Generally speaking, the indices level is sufficient.

Health: health information

Curl 'localhost:9200/_cluster/health?pretty'

[root@Node2 ~] # curl localhost:9200/_cluster/health?level=nodes {"cluster_name": "RK", "status": "green", "timed_out": false, "number_of_nodes": 2, "number_of_data_nodes": 2, "active_primary_shards": 9, "active_shards": 18, "relocating_shards": 0, "initializing_shards": 0, "unassigned_shards": 0, "delayed_unassigned_shards": 0, "number_of_pending_tasks": 0 "number_of_in_flight_fetch": 0, "task_max_waiting_in_queue_millis": 0 "active_shards_percent_as_number": 100.0} [root@Node2 ~] # [root@Node2 ~] # curl 'localhost:9200/_cluster/health?level=cluster&pretty' # also needs to use JSON format output needs to add & pretty and uses references {"cluster_name": "RK", "status": "green", "timed_out": false "number_of_nodes": 2, "number_of_data_nodes": 2, "active_primary_shards": 9, "active_shards": 18, "relocating_shards": 0, "initializing_shards": 0, "unassigned_shards": 0, "delayed_unassigned_shards": 0, "number_of_pending_tasks": 0, "number_of_in_flight_fetch": 0 "task_max_waiting_in_queue_millis": 0, "active_shards_percent_as_number": 100.0} [root@Node2 ~] # curl 'localhost:9200/_cluster/health?level=indices&pretty' {"cluster_name": "RK", "status": "green", "timed_out": false, "number_of_nodes": 2, "number_of_data_nodes": 2, "active_primary_shards": 9 "active_shards": 18, "relocating_shards": 0, "initializing_shards": 0, "unassigned_shards": 0, "delayed_unassigned_shards": 0, "number_of_pending_tasks": 0, "number_of_in_flight_fetch": 0, "task_max_waiting_in_queue_millis": 0, "active_shards_percent_as_number": 100.0 "indices": {".clients-es-2-2017.04.20": {"status": "green", "number_of_shards": 1, "number_of_replicas": 1, "active_primary_shards": 1, "active_shards": 2, "relocating_shards": 0, "initializing_shards": 0 "unassigned_shards": 0}, ".kibana": {"status": "green", "number_of_shards": 1, "number_of_replicas": 1, "active_primary_shards": 1, "active_shards": 2, "relocating_shards": 0, "initializing_shards": 0, "unassigned_shards": 0} ".customers-data-2": {"status": "green", "number_of_shards": 1, "number_of_replicas": 1, "active_primary_shards": 1, "active_shards": 2, "relocating_shards": 0, "initializing_shards": 0, "unassigned_shards": 0} ".customers-es-2-2017.04.17": {"status": "green", "number_of_shards": 1, "number_of_replicas": 1, "active_primary_shards": 1, "active_shards": 2, "relocating_shards": 0, "initializing_shards": 0, "unassigned_shards": 0} ".customers-kibana-2-2017.04.20": {"status": "green", "number_of_shards": 1, "number_of_replicas": 1, "active_primary_shards": 1, "active_shards": 2, "relocating_shards": 0, "initializing_shards": 0, "unassigned_shards": 0} ".customers-es-2-2017.04.19": {"status": "green", "number_of_shards": 1, "number_of_replicas": 1, "active_primary_shards": 1, "active_shards": 2, "relocating_shards": 0, "initializing_shards": 0, "unassigned_shards": 0} ".customers-es-2-April 18, 2017.04.18": {"status": "green", "number_of_shards": 1, "number_of_replicas": 1, "active_primary_shards": 1, "active_shards": 2, "relocating_shards": 0, "initializing_shards": 0, "unassigned_shards": 0} ".customers-kibana-2-April 18, 2017.04.18": {"status": "green", "number_of_shards": 1, "number_of_replicas": 1, "active_primary_shards": 1, "active_shards": 2, "relocating_shards": 0, "initializing_shards": 0, "unassigned_shards": 0} ".customers-kibana-2-2017.04.19": {"status": "green", "number_of_shards": 1, "number_of_replicas": 1, "active_primary_shards": 1, "active_shards": 2, "relocating_shards": 0, "initializing_shards": 0, "unassigned_shards": 0}

Stats: cluster statistics

Curl 'localhost:9200/_cluster/stats'

Statistics of cluster nodes:

Curl 'localhost:9200/_nodes/stats'

[root@Node2 ~] # curl 'localhost:9200/_cluster/stats?pretty' {"_ nodes": {"total": 2, "successful": 2, "failed": 0}, "cluster_name": "RK", "timestamp": 1492754076865, "status": "green", "indices": {"count": 9, "shards": {"total": 18 "primaries": 9, "replication": 1.0, "index": {"shards": {"min": 2, "max": 2, "avg": 2}, "primaries": {"min": 1, "max": 1 "avg": 1.0}, "replication": {"min": 1.0, "max": 1.0, "avg": 1.0}}, "docs": {"count": 93528, "deleted": 286}} "store": {"size_in_bytes": 81565430, "throttle_time_in_millis": 0}, "fielddata": {"memory_size_in_bytes": 6608, "evictions": 0}, "query_cache": {"memory_size_in_bytes": 0, "total_count": 16, "hit_count": 0 "miss_count": 16, "cache_size": 0, "cache_count": 0, "evictions": 0}, "completion": {"size_in_bytes": 0}, "segments": {"count": 96, "memory_in_bytes": 902530, "terms_memory_in_bytes": 557988 "stored_fields_memory_in_bytes": 38640, "term_vectors_memory_in_bytes": 0, "norms_memory_in_bytes": 3712, "points_memory_in_bytes": 52750, "doc_values_memory_in_bytes": 249440, "index_writer_memory_in_bytes": 0, "version_map_memory_in_bytes": 0 "fixed_bit_set_memory_in_bytes": 0, "max_unsafe_auto_id_timestamp":-1, "file_sizes": {}, "nodes": {"count": {"total": 2, "data": 2, "coordinating_only": 0, "master": 2, "ingest": 2} "versions": ["5.3.0"], "os": {"available_processors": 3, "allocated_processors": 3, "names": [{"name": "Linux", "count": 2}], "mem": {"total_in_bytes": 4227088384 "free_in_bytes": 94605312, "used_in_bytes": 4132483072, "free_percent": 2, "used_percent": 98}}, "process": {"cpu": {"percent": 0}, "open_file_descriptors": {"min": "max": 214,188}}, "jvm": {"max_uptime_in_millis": 30736656, "versions": [{"version": "1.8.0mm 121", "vm_name": "OpenJDK 64-Bit Server VM" "vm_version": "25.121-b13", "vm_vendor": "Oracle Corporation", "count": 2}], "mem": {"heap_used_in_bytes": 224113584, "heap_max_in_bytes": 4268818432}, "threads": 73} "fs": {"total_in_bytes": 38102884352, "free_in_bytes": 25168912384, "available_in_bytes": 23233347584, "spins": "true"}, "plugins": [{"name": "x-pack", "version": "5.3.0", "description": "Elasticsearch Expanded Pack Plugin" "classname": "org.elasticsearch.xpack.XPackPlugin"}], "network_types": {"transport_types": {"netty4": 2}, "http_types": {"netty4": 2}

Cluster status information:

Curl 'localhost:9200/_cluster/state/?pretty'

Metrics:

Version

Master_node

Nodes

Routing_table

Metadata

Blocks

[root@Node2 ~] # curl 'localhost:9200/_cluster/state?pretty' [root@Node2 ~] # curl' localhost:9200/_cluster/state/version?pretty' {"cluster_name": "RK", "version": 19, "state_uuid": "tXW8CtBXS1a3Sn1wCBci2g"} [root@Node2 ~] # curl 'localhost:9200/_cluster/state/master_node?pretty' {"cluster_name": "RK" "master_node": "bMAYDjb2Rsyfpn92Lnax3w"} [root@Node2 ~] # curl 'localhost:9200/_cat/master'bMAYDjb2Rsyfpn92Lnax3w 192.168.2.116 192.168.2.116 node7 [root@Node2 ~] # curl' localhost:9200/_cat/nodes'192.168.2.114 7 96 0 0.00 0.00 mdi-node2192.168.2.116 5 100 0 0.00 0.00 mdi * node7

III. Detailed explanation of the terms of ES

The act of storing data in Elasticsearch is called indexing, but before indexing, we need to know where the data should be stored.

In Elasticsearch, documents belong to a type (type), and these types exist in the index (index), so we can draw some simple comparison diagrams to compare traditional relational databases:

Relational DB-> Databases-> Tables-> Rows-> ColumnsElasticsearch-> Indices-> Types-> Documents-> FieldsElasticsearch clusters can contain multiple indexes (databases), each index can contain multiple types (types) (tables), each type contains multiple documents (documents) (rows), and then each document contains multiple fields (Fields) (columns).

The distinction between the meaning of "index" you may have noticed that the word "index" has different meanings in Elasticsearch, so it is necessary to make a distinction here:

Index (noun) as mentioned above, an index (index) is like a database in a traditional relational database, it is where related documents are stored, and the plural of index is indices or indexes.

Index (verb) "index a document" means to store a document in an index (noun) so that it can be retrieved or queried. This is much like the INSERT keyword in SQL, except that if the document already exists, the new document will overwrite the old one.

Inverted index traditional databases add an index to a specific column, such as an B-Tree index, to speed up retrieval. Elasticsearch and Lucene use a data structure called inverted index (inverted index) to achieve the same goal.

4. API related to CRUD operation

Create:

[root@Node2 ~] # curl-X PUT localhost:9200/students/class1/1?pretty-d'{"first_name": "Jing", "last_name": "Guo", "gender": "Male", "courses": "Xianglong Shiba Zhang"}'{"_ index": "students", "_ type": "class1", "_ id": "1", "_ version": 1, "result": "created" "_ shards": {"total": 2, "successful": 2, "failed": 0} "created": true} [root@Node2 ~] # curl localhost:9200/_cat/indices?vhealth status index uuid pri rep docs.count docs.deleted store.size pri.store.sizegreen open. Customers-es-2-2017.04.19 EMZBcpI7RV6V9aZZ46KSWw 1 1 39720 84 34.1mb 17mbgreen open .monito ring-kibana-2-2017.04 .20 JqpaERNnQwOhTpVu0F-yCA 11 169 0 285.9kb 142.9kbgreen open. 285.9kb 142.9kbgreen open-es-2-2017.04.20 nLnAKVKKSOavBc7dc2mOgA 1 1 5536 126 5.5mb 2.7mbgreen open. Kibana FX7h92rvRwuxxPAYSeORTw 11 10 6.3kb 3.1kbgreen open. Es-2-2017.04.18 FuWnLKlRRoWlMKznkAle2w 1 1 38676 50 31.8mb 15.9mbgreen open. 969.3kb 484.6kbgreen open-kibana-2-2017.04.19 5DNI-F44TJmaErmQ2qRHPQ 1 1 2121 0 969.3kb 484.6kbgreen open. NyfdDSpsQKuyn4ZTAdm6aw 1 1 3874 24 3.2mb 1. 6mbgreen open students 104QJMEGQjCtxMVI9rWvhQ 5 1 10 10.8kb 5.4kbgreen open. 16.6kb 8.3kbgreen open-data-2 pIeAPjAQTuihoJenwmlTGA 1 1 3 2 16.6kb 8.3kbgreen open. Sister-kibana-2-2017.04.18 aV6CvYltR2aNl3URpPVDig 1 1 3428 0 1.6mb

We see that path:/students/class1/1 contains three pieces of information:

Name description students

The index name class1 type name 1 the employee's ID (or replace if the ID already exists) the request entity (JSON document) contains all the information about the employee. His name is "Guo Jing", Male, learning Xianglong Shiba Zhang

It's easy! It does not require you to do additional administrative operations, such as creating an index or defining the data type of each field. We can index documents directly, Elasticsearch has built-in all the default settings, and all administrative operations are transparent.

Much like GRUD in mongodb database, indexes and categories do not need to be created in advance

Next, let's add more student information to the catalog:

[root@Node2 ~] # curl-X PUT "localhost:9200/students/class1/2?pretty"-d'{"first_name": "Rong", "last_name": "Huang", "gender": "Female", "age": 23, "courses": "Luoying Shenjian"}'{"_ index": "students", "_ type": "class1", "_ id": "2", "_ version": 1 "result": "created", "_ shards": {"total": 2, "successful": 2, "failed": 0}, "created": true}

List the documents in the type: GET method

[root@Node2 ~] # curl localhost:9200/students/class1/1?pretty {"_ index": "students", "_ type": "class1", "_ id": "1", "_ version": 1, "found": true, "_ source": {"first_name": "Jing", "last_name": "Guo", "gender": "Male" "courses": "Xianglong Shiba Zhang"} [root@Node2 ~] # curl localhost:9200/students/?pretty # lists the typed structure of the index {"students": {"aliases": {}, "mappings": {"class1": {"properties": {"age": {"type": "long"} "courses": {"type": "text", "fields": {"keyword": {"type": "keyword", "ignore_above": 256} "first_name": {"type": "text", "fields": {"keyword": {"type": "keyword", "ignore_above": 256} "gender": {"type": "text", "fields": {"keyword": {"type": "keyword", "ignore_above": 256} "last_name": {"type": "text", "fields": {"keyword": {"type": "keyword", "ignore_above": 256}} "settings": {"index": {"creation_date": "1492762539451", "number_of_shards": "5", "number_of_replicas": "1", "uuid": "104QJMEGQjCtxMVI9rWvhQ", "version": {"created": "5030099"} Provided_name: "students"}

Update the document:

The PUT method overwrites the original document

Replace the original document with the same ID

The POST method, using _ update API, can update only part of the content, and get

[root@Node2 ~] # curl-X POST 'localhost:9200/students/class1/2/_update?pretty'-d' > {> "doc": {"age": 22} >}'{"_ index": "students", "_ type": "class1", "_ id": "2", "_ version": 2, "result": "updated", "_ shards": {"total": 2 "successful": 2, "failed": 0}} [root@Node2 ~] # curl localhost:9200/students/class1/2?pretty {"_ index": "students", "_ type": "class1", "_ id": "2", "_ version": 2, "found": true, "_ source": {"first_name": "Rong", "last_name": "Huang" "gender": "Female", "age": 22, "courses": "Luoying Shenjian"}}

Delete documents: DELETE method

[root@Node2 ~] # curl-X DELETE localhost:9200/students/class1/1?pretty {"found": true, "_ index": "students", "_ type": "class1", "_ id": "1", "_ version": 2, "result": "deleted", "_ shards": {"total": 2, "successful": 2 "failed": 0}} [root@Node2 ~] # curl localhost:9200/students/class1/1?pretty {"_ index": "students", "_ type": "class1", "_ id": "1", "found": false}

Delete the index:

[root@Node2 ~] # curl localhost:9200/_cat/indices?vhealth status index uuid pri rep docs.count docs.deleted store.size pri.store.sizegreen open-es-2-2017.04.19 EMZBcpI7RV6V9aZZ46KSWw 1 1 39720 84 34.1mb 17mbgreen open .monito ring-kibana-2-2017.04.20 JqpaERNnQwOhTpVu0F-yCA 1 1169 0 285.9kb 142.9kbgreen open. Es-2-es-2-2017.04.20 nLnAKVKKSOavBc7dc2mOgA 1 1 5536 126 5.5mb 2.7mbgreen open. Kibana FX7h92rvRwuxxPAYSeORTw 1110 6.3kb 3.1kbgreen open. Sister-es-2-2017.04. 18 FuWnLKlRRoWlMKznkAle2w 1 1 38676 50 31.8mb 15.9mbgreen open. 969.3kb 484.6kbgreen open-kibana-2-2017.04.19 5DNI-F44TJmaErmQ2qRHPQ 1 1 2121 0 969.3kb 484.6kbgreen open. Cycle-es-2-2017.04.17 nyfdDSpsQKuyn4ZTAdm6aw 1 1 3874 24 3.2mb 1.6mbgreen open. 2 pIeAPjAQTuihoJenwmlTGA 1 1 3 2 16.6kb 8.3kbgreen open. AV6CvYltR2aNl3URpPVDig-kibana-2-2017.04.18 aV6CvYltR2aNl3URpPVDig 1 1 3428 0 1.6mb 850.9kbgreen open students 104QJMEGQjCtxMVI9rWvhQ 5 100 21.6kb 10.8kb [root@ Node2 ~] # curl-X DELETE localhost:9200/students?pretty {"acknowledged": true} [root@Node2 ~] # curl localhost:9200/_cat/indices?vhealth status index uuid pri rep docs.count docs.deleted store.size pri.store.sizegreen open. EMZBcpI7RV6V9aZZ46KSWw-es-2-2017.04.19 EMZBcpI7RV6V9aZZ46KSWw 1 1 39720 84 34.1mb 17mbgreen open. April-kibana-2-2017.04.20 JqpaERNnQwOhTpVu0F-yCA 11 1690 285.9kb 142.9kbgreen open. NLnAKVKKSOavBc7dc2mOgA-es-2-2017.04.20 nLnAKVKKSOavBc7dc2mOgA 1 1 5536 126 5.5mb 2.7mbgreen open. Kibana FX7h92rvRwuxxPAYSeORTw 1 1 1 0 6.3kb 3.1kbgreen open .birthday-es-2-2017.04.18 FuWnLKlRRoWlMKznkAle2w 1 1 38676 50 31.8mb 15.9mbgreen open. Steps-kibana-2-2017.04.19 5DNI-F44TJmaErmQ2qRHPQ 1 1 2121 0 969.3kb 484.6kbgreen open. Steps-es-2-2017.04.17 nyfdDSpsQKuyn4ZTAdm6aw 1 1 3874 24 3.2mb 1.6mbgreen open. April-kibana-2-2017.04.18 aV6CvYltR2aNl3URpPVDig 1 1 3428 0 1.6mb 850.9kbgreen open. Data-2 pIeAPjAQTuihoJenwmlTGA 1 1 3

There is no operation on the type?

Search (query) data

Query data requires the use of ES's _ search API

Query DSL: domain type (format) query language, based on JSON, used to implement many types of query operations

For example: simple query, fuzzy insertion, range query, Boolean query, etc.

The execution of the ES query operation is divided into two phases:

Dispersion stage:

Merge phase:

The data searched in ES can be broadly understood as two categories:

Types:exact does a precise value search on a specified type, and does a precise search when searching.

Exact value: refers to the unprocessed original value, such as: Notebook,notebook is not the exact value

Full-text: full-text search

It is used to reference the data in the text to determine the extent to which the document matches the query request; that is, to evaluate the relevance between the document and the query requested by the user; in order to complete the full-text search, ES must first analyze the text and create an inverted index. The data in the inverted index also needs to be "normalized" to a standard format.

Query method:

There are two ways to initiate a query request to ES:

1. Query through Restful request API, also known as query string

2. By sending REST request body

List all documents indexed:

[root@Node2 ~] # curl localhost:9200/students/_search?pretty # Restful request API "took": 5, # execution duration Ms "timed_out": false, # whether the query timed out "_ shards": {"total": 5, "successful": 5, "failed": 0}, "hits": {"total": 2, "max_score": 1.0, "hits": [{"_ index": "students", "_ type": "class1" "_ id": "2", "_ score": 1.0, "_ source": {"first_name": "Rong", "last_name": "Huang", "gender": "Female", "age": 23, "courses": "Luoying Shenjian"}} {"_ index": "students", "_ type": "class1", "_ id": "1", "_ score": 1.0, "_ source": {"first_name": "Jing", "last_name": "Guo", "gender": "Male" "courses": "Xianglong Shiba Zhang"}]} [root@Node2 ~] # curl localhost:9200/students/_search?pretty-d'# REST request body > {> "query": {"match_all": {}} >}'{"took": 93, "timed_out": false, "_ shards": {"total": 5, "successful": 5 "failed": 0}, "hits": {"total": 3, "max_score": 1.0, "hits": [{"_ index": "students", "_ type": "class1", "_ id": "2", "_ score": 1.0 "_ source": {"first_name": "Rong", "last_name": "Huang", "gender": "Female", "age": 23, "courses": "Luoying Shenjian"}}, {"_ index": "students", "_ type": "class1" "_ id": "1", "_ score": 1.0, "_ source": {"first_name": "Jing", "last_name": "Guo", "gender": "Male", "courses": "Xianglong Shiba Zhang"}} {"_ index": "students", "_ type": "S1", "_ id": "yw", "_ score": 1.0, "_ source": {"first_name": "xiejun", "age": 27}}]}}

Multi-index, multi-type query:

/ _ search: all indexes

/ INDEX_NAME/_search: single index

/ INDEX1_NAME1,INDEX2_NAME/_search: multiple indexes

/ scurvy recently tweeted pinch search:

/ students/class1/_search: single type search

/ students/class1,class1/_search: multi-type search

Mapping and Analysis: mapping and analysis

For each document, ES takes all the values of all its fields and generates a field called "_ all"; when executing a query, if the query is not specified by query_string, the operation is performed on the _ all field

Example:

GET / _ search?q= "Xianglog" # query in the _ all domain if no domain is specified

GET / _ search?q='Xianglong%20Shiba%20Zhang' # specifies the query in the domain, and the space needs to be replaced by% 20.

GET / _ search?q=courses:'Xianglong'

GET / _ search?q=courses:'Xianglong%20Shiba%20Zhang'

Data type:

String,numbers,boolean,dates

View the data types of the fields in the document:

[root@Node2 ~] # curl "localhost:9200/students/class1/_mapping?pretty" {"students": {"mappings": {"class1": {"properties": {"age": {"type": "long"}, "courses": {"type": "text" "fields": {"keyword": {"type": "keyword", "ignore_above": 256}, "first_name": {"type": "text" "fields": {"keyword": {"type": "keyword", "ignore_above": 256}, "gender": {"type": "text" "fields": {"keyword": {"type": "keyword", "ignore_above": 256}, "last_name": {"type": "text" "fields": {"keyword": {"type": "keyword", "ignore_above": 256}

The process of creating an inverted index:

Participle-regularization; this process is analysis, which requires an analyzer (analyzer)

The parser consists of three components:

Character filter, word separator, word segmentation filter

ES built-in parser:

Standard analyzer: standard parser, default parser for ES, suitable for multiple languages, based on unicode analysis

Simple analyzer: a simple parser that divides words based on all non-letters (using non-letters as word boundaries)

Whitespace analyzer: only use white space as word boundaries

Language analyzer: a special language analyzer for multiple languages

The parser is used not only when creating indexes, but also when building queries

Request body:

It is divided into two categories:

Query dsl: when executing a full-text query, the matching result is judged based on the degree of relevance

Query execution is complex and will not be cached

Filter dsl: when executing an exact query, judge based on whether the result is "yes" or "no"

Fast and cached results

Filter dsl:

Term filter: exactly matches the document containing the specified term

Such as: {

"query": {

"term": {

"name": "Guo"

}

[root@Node2 ~] # curl "localhost:9200/students/_search?pretty"-d'{"query": {"term": {"name": "Guo"}}'{"took": 3, "timed_out": false, "_ shards": {"total": 5, "successful": 5, "failed": 0} "hits": {"total": 0, "max_score": null, "hits": []}}

Terms filter: for multi-valued exact matching

Such as: {

"query": {

"terms": {

"name": ["Guo", "Rong"]

}

Range filters: used to query values or times within a specified range

Such as: {

"query": {

"range": {

"age": {

"gte": 15

"lte": 25

}

[root@Node2 ~] # curl "localhost:9200/students/_search?pretty"-d'{"query": {"range": {"age": {"gte": 15, "lte": 27} >}'{"took": 24 "timed_out": false, "_ shards": {"total": 5, "successful": 5, "failed": 0}, "hits": {"total": 2, "max_score": 1.0, "hits": [{"_ index": "students", "_ type": "class1" "_ id": "2", "_ score": 1.0, "_ source": {"first_name": "Rong", "last_name": "Huang", "gender": "Female", "age": 23, "courses": "Luoying Shenjian"}} {"_ index": "students", "_ type": "S1", "_ id": "yw", "_ score": 1.0, "_ source": {"first_name": "xiejun", "age": 27}}]}}

Exists and missing filters:

{

"query": {

"exists": {

"age": 23

}

Boolean filter: merging multiple filter clauses based on boolean logic

Must: all clause conditions within it must match at the same time, that is, and

Example:

Must: {

"term": {"age": 25}

"term": {"gender": "Female"}

}

Must_not: all its clauses must not match, that is, not

Must_not: {

"term": {"age": 25}

}

Should: at least one clause matches, that is, or

Should: {

"term": {"age": 25}

"term": {"gender": "Female"}

}

QUERY DSL:

Match_all Query: used to match all documents. No query is specified. The default is match_all query.

{"match_all": {}}

Match Query: execute full-text or exact-value queries on almost any domain

If you execute a full-text query, you need to analyze it first.

{"match": {"students": "Guo}}"

If you execute an exact-value query to search for exact values, it is recommended that you use filtering instead of query

{"match": {"name": "Guo"}}

Multi_match Query: used to execute the same query on multiple domains

{

"multi_match": {

"query": {

"students": "Guo"

}

"field": {

"name"

"description"

}

Bool query: merges multiple query statements based on boolean logic. Unlike bool filter, the query clause does not return "yes" or "no", but its calculated match score, so boolean query merges its score for each clause:

Must:

Must_not:

Should:

You can also merge filter and query statements:

{

"filterd": {

Query: {"match": {"gender": "Female"}

Filter: {"term": {"age": 25}}

}

Query statement syntax check:

GET / INDEX/_validate/query?explain&pretty-d'

{

...

Display details:

GET / INDEX/_valudate/query?explain&pretty-d'

{

...

[root@Node2 ~] # curl "localhost:9200/students/_validate/query?pretty"-d'{"query": {"range": {"age": {"gte": 15, "lte": 27}}'{"valid": true "_ shards": {"total": 1, "successful": 1, "failed": 0}} [root@Node2 ~] # curl "localhost:9200/students/_validate/query?explain&pretty"-d'{"query": {"range": {"age": {"gte": 15 "lte": 27}}'{"valid": true, "_ shards": {"total": 1, "successful": 1, "failed": 0}, "explanations": [{"index": "students", "valid": true "explanation": "age: [15 TO 27]"}]}

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.