Common operations of ElasticSearch: documentation 07/06 Update SLTechnology News&Howtos

Common operations of ElasticSearch: documentation

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

[TOC]

1 New document 1.1 specify idPUT my_blog/article/1 {"id": 1, "title": "elasticsearch", "posttime": "2017-05-01", "content": "elasticsearch is helpfull!"}

Return:

{"_ index": "my_blog", "_ type": "article", "_ id": "1", "_ version": 1, "result": "created", "_ shards": {"total": 2, "successful": 1, "failed": 0}, "created": true}

The version number is automatically incremented as the document is updated.

1.2 No id specified

If you do not specify that id,es will be generated automatically, you can only use post at this time:

POST my_blog/article {"id": 2, "title": "spark", "posttime": "2017-05-01", "content": "spark is helpfull!"}

Return:

{"_ index": "my_blog", "_ type": "article", "_ id": "AWagTCv8O1qbT1zqbREV", "_ version": 1, "result": "created", "_ shards": {"total": 2, "successful": 1, "failed": 0}, "created": true} 2 get documents

Get the existing documents:

GET my_blog/article/1

Return:

{"_ index": "my_blog", "_ type": "article", "_ id": "1", "_ version": 1, "found": true, "_ source": {"id": 1, "title": "elasticsearch", "posttime": "2017-05-01", "content": "elasticsearch is helpfull!"}}

Get a document that does not exist:

GET my_blog/article/2

Return:

{"_ index": "my_blog", "_ type": "article", "_ id": "2", "found": false} 2.2 Test whether the document exists

Use HEAD to test whether a document exists:

HEAD my_blog/article/1200-OKHEAD my_blog/article/2404-Not Found2.3 batch acquisition

Different index, different type:

GET _ mget {"docs": [{"_ index": "my_blog", "_ type": "article", "_ id": 1}, {"_ index": "twitter", "_ type": "tweet", "_ id": 2}]}}

Different type under the same index:

GET my_blog/_mget {"docs": [{"_ type": "article", "_ id": 1}, {"_ type": "essay", "_ id": 2}]}

Same index, same type:

GET my_blog/article/_mget {"docs": [{"_ id": 1}, {"_ id": 2}]}

Or:

GET my_blog/article/_mget {"ids": [1J 2]} 3 update the document

The principle of es updating document is to find the document first, delete the old document content and perform the update, and then index the latest document after the update.

Add the following document first:

PUT test/type1/1 {"counter": 1, "tags": ["red"]} 3.1 Update document field content

Add 4 to the value of counter:

POST test/type1/1/_update {"script": {"inline": "ctx._source.counter + = params.count", "lang": "painless", "params": {"count": 4}

Note1: inline is the script executed in the command, ctx is an execution object in the scripting language, painless is a built-in scripting language in es, and params is a collection of parameters

In addition to accessing _ source, Note2:ctx objects can also access _ index, _ type, _ id, _ version, _ routing, _ parent and other fields

Add a value to the tags field:

POST test/type1/1/_update {"script": {"inline": "ctx._source.tags.add (params.tag)", "lang": "painless", "params": {"tag": "blue"}} 3.2 add and remove fields

Add a field name to test/type1/1:

POST test/type1/1/_update {"script": {"inline": "ctx._source.name=\" test\ "}}

The above command can also be abbreviated to: {"script": "" ctx._source.name=\ "test\"}.

Remove the name field:

POST test/type1/1/_update {"script": {"inline": "ctx._source.remove (\" name\ ")"}} 3.3 upsert operation

If the document does not exist, upsert creates a new document, and if the document exists, the script script executes normally. As follows:

POST test/type1/2/_update {"script": {"inline": "ctx._source.counter + = params.count", "lang": "painless", "params": {"count": 4}}, "upsert": {"counter": 1, "tag": ["pink"]}}

If test/type1/2 exists, update count, and if not, create a new document that contains the fields counter and tag.

Return:

{"_ index": "test", "_ type": "type1", "_ id": "2", "_ version": 1, "result": "created", "_ shards": {"total": 2, "successful": 1, "failed": 0} 3.4 doc-based update method

You can also use doc to update field contents or add new fields:

POST myblog/article/1/_update {"doc": {"title": "test new title"} POST myblog/article/1/_update {"doc": {"new field": "this is a new field"} 4 query update POST my_blog/_update_by_query {"script": {"inline": "ctx._source.content = params.content", "lang": "painless" "params": {"content": "spark is popular"}, "query": {"term": {"title": {"value": "spark"}

Return:

{"took": 33, "timed_out": false, "total": 1, "updated": 1, "deleted": 0, "batches": 1, "version_conflicts": 0, "noops": 0, "retries": {"bulk": 0, "search": 0}, "throttled_millis": 0, "requests_per_second":-1, "throttled_until_millis": 0 "failures": []} 5 Delete document DELETE my_blog/article/2

If you specify a route when indexing the document, you can also increase the routing parameters when deleting:

DELETE my_blog/article/2?routing=user123

Note1: if the route value is incorrect when deleting, the deletion will fail

Note2: when the _ routing of the map is set to required and there is no specified route value, performing a delete operation will throw a missing route exception and reject the request

6 query to delete POST my_blog/_delete_by_query {"query": {"term": {"title": {"value": "mybatis"}

Delete all documents under one type:

POST my_blog/article/_delete_by_query {"query": {"match_all": {} 7 bulk operation 7.1 command format

Use the following command:

Curl-XPOST 'localhost:9200/indexname/_bulk?prettry'-- data-binary @ accounts.json

The accounts.json file should meet the following format:

Action_and_meta_data line data line

In the Note1:action_and_meta_data line, action must be index, create, update, or delete,metadata needs to indicate the _ index, _ type, and _ id of the document to be manipulated

Note2:data lines are added data, and data lines are needed when adding documents.

Note3: there must be a newline character "\ n" at the end of each line, as well as the last line, which can effectively separate each line.

7.2 add document {"index": {"_ index": "my_blog"}, "_ type": "article", "_ id": "1"} {"title": "blog title"}

{"create": {"_ index": "my_blog", "_ type": "article", "_ id": "1"} {"title": "blog title"}

It is OK not to write id.

Delete document {"delete": {"_ index": "website", "_ type": "blog", "_ id": "123"}} 7.4Synthetical case

The following includes an index document request, an update document request, and a delete document request:

{"delete": {"_ index": "website", "_ type": "blog", "_ id": "123"}} {"create": {"_ index": "website", "_ type": "blog", "_ id": "123"} {"title": "blog title"} {"index": {"_ index": "website" "_ type": "blog"} {"title": "blog title"} {"update": {"_ index": "website", "_ type": "blog", "_ id": "123"} {"doc": {"title": "blog title"} 8 version control

For the first contact, 8.2 can be ignored, just learn about the command operation of 8.3.

When es updates the document, it first reads the source document, updates the original document, and then re-indexes the whole document after the update operation is completed.

8.1 Lock Control

It is very likely that multiple users modify and update the data of the same document at the same time, which requires transactional control, or concurrency control.

8.1.1 pessimistic lock control

If a thread modifies the data and locks the data, other threads need to wait for the current lock to be released to access, which ensures that at most one thread accesses the data at a time. Many of these locking mechanisms are used in traditional relational databases, such as row locks, table locks, read locks, write locks and so on.

8.1.2 optimistic lock control

The data resource is not locked, only if the data integrity is violated when the data is submitted. Elasticsearch uses optimistic locking mechanism, which is suitable for applications with more read operations than write operations, which can save lock overhead and improve throughput.

Version control of 8.2 es

Since es uses optimistic locks, how do you ensure that old data does not overwrite new data? In es, _ version is used for version control. Every time a document is updated, it will be added by 1. 0.

The document version control mechanism of es mainly includes internal version control and external version control:

The internal version control mechanism requires that each operation request can be successful only if the version number is equal; external version control requires that the external document version is higher than the internal document version to be updated successfully.

In fact, no matter whether you request to obtain data or update data, you can carry the version number. No matter how complicated the situation is, you only need to remember the following two points:

1. If you only request to obtain the data operation, the internal version control mechanism takes effect, but the external version control mechanism does not, as follows: a. Without the version number, the operation is successful: B. Carry the build number, so it must be equal to the current version number of the document; c. If you carry an external version number, it must be equal to the current version number of the document; 2. If it is an update operation, the situation is as follows: a. Do not carry the version number, the operation is successful, the document version number will be added 1 position b. If you carry the build number, it must be equal to the current version number of the document, and the document version number will be added with 1% c. If you carry an external version number, it must be greater than the current version number of the document, and the document version will be equal to the version number carried.

You can also think about this from the point of view of whether or not to carry a version number:

1. Without the version number, the request for data acquisition and update operation will be successful; 2. Carry the build number, no matter which operation, must be equal to the document version number; 3. Carry the external version number. The version number should be equal when you request to obtain data, and the version number should be greater than the document version number when updating.

As for why es is designed in this way? In fact, the internal version number is provided by es, but in fact, you may also have your own business or program requirements, that is, your application system itself maintains a version number, or you want to maintain a version number through some mechanism, then you can use an external version number.

8.3 Command Operation GET website/blog/1?version=1PUT website/blog/1?version=5&version_type=external9 routing Mechanism 9.1 fragment location calculation and case

When there are multiple fragments in the index, how does es determine which document to save to when indexing a document? Suppose the environment is as follows:

Master Node:shard0 (primary) shard1 shard2 (primary) Common Node:shard0shard1 (primary) shard2

Es's routing mechanism places documents with the same hash value into the same main slice through the hash algorithm, as follows:

Shard = hash (routing)% number_of_primary_shards

If we add a document and do not specify that the id generated for us by id,es is AWagTCv8O1qbT1zqbREV, apply the above formula, shard should be:

Shard = hash ("AWagTCv8O1qbT1zqbREV") 3

Of course, the implementation of the hash function is not necessary, we can call the hash () function provided by it in python to demonstrate the above calculation:

> shard = hash ("AWagTCv8O1qbT1zqbREV") 3 > shard2

Obviously, the document will be stored on shard 2, the third shard.

As you can see from the above, the default route pattern ensures an even distribution of data, but you can also customize the routing value to specify where the document is stored.

9.2 es query process and custom routing value

If there is an index with 50 fragments, the procedure for executing a query on the cluster is as follows:

(1) the query request is first received by a node in the cluster; (2) the node that received the request broadcasts the query to each shard of the index; (3) each shard executes the search query and returns the results; (4) the results are merged, sorted and returned to the user on the channel node.

This broadcast can be avoided by customizing routing values, as illustrated by an example below.

Normally, let's add a document like this:

PUT website/article/1 {"title": "My first blog entry", "text": "Just trying this out...", "user": "user123"}

When querying, you want to query all articles on user123:

GET website/article/_search {"query": {"term": {"user": "user123"}

Obviously, this query will follow the above process, that is, the request will be sent to all shards. I hope to optimize it at this time.

Add a document using user as a routing:

PUT website/article/1?routing=user123 {"title": "My first blog entry", "text": "Just trying this out...", "user": "user123"}

After the routing value is specified, all subsequent articles (documents) published by user123 will be stored on the same shard, which is assumed to be shard 1. In this way, when we query articles published by user123, we only need to specify the routing value when search, so that the request will not be broadcast. The request goes directly to shard 1. The query is as follows:

GET website/article/_search?routing=user123

Note1: this will also bring problems. For example, user123 has published hundreds of thousands of articles, but other users have only a few articles. Obviously, the distribution of data is uneven.

Note2: you can also specify multiple route values for a document, which are separated by commas (in this way, the document may be assigned to multiple shards. As for the shards that meet its conditions, how is es selected, and what algorithm can be used to study for yourself)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.