Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Elasticsearch indexed document

2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

The content comes mainly through the translation of official documents, version 7.10

Indexing document operations (implemented through curl)

Curl-X PUT "localhost:9200/twitter/_doc/1"-H 'Content-Type: application/json'-d'

{

"user": "kimchy"

"post_date": "2009-11-15T14:12:12"

"message": "trying out Elasticsearch"

}

'

-X option: specify the request operation of curl. The default is GET or PUT POST DELETE.

-H option: pass in the request header

-d option: data, data content option

When no index exists, it is created automatically. Of course it can be set (via action.auto_create_index).

PUT _ cluster/settings

{

"persistent": {

"action.auto_create_index": "twitter,index10,-index1*,+ind*"

}

}

Note: an index named twitter,index10 is created and does not conform to the index1* format, but it is also created in accordance with the ind* format.

PUT _ cluster/settings

{

"persistent": {

"action.auto_create_index": "false"

}

}

Note: all are not created automatically by default. Will prompt for an error. For example: {"error": {"root_cause": [{"type": "index_not_found_exception", "reason": "no such index [mytwitter]"

PUT _ cluster/settings

{

"persistent": {

"action.auto_create_index": "true"

}

}

Note: all are created automatically by default

According to the default MAPPING rule, only one type is allowed under an index.

For example, try to create a second type named mydoc:

Curl-X PUT "localhost:9200/twitter/mydoc/1"-H 'Content-Type: application/json'-d'

{

"user": "kimchy"

"post_date": "2009-11-15T14:12:12"

"message": "trying out Elasticsearch"

}

'

An error will be generated: {"error": {"root_cause": [{"type": "illegal_argument_exception", "reason": "Rejecting mapping update to [twitter] as the final mapping would have more"

Than 1 type: [_ doc, mydoc] "}]," type ":" illegal_argument_exception "," reason ":" Rejecting mapping update to [twitter] as the final mapping would have more than 1

Type

2 op_type options for indexing documents (only new ones are allowed, no document updates are allowed):

Such as

Curl-X PUT "localhost:9200/twitter/_doc/1?op_type=create"-H 'Content-Type: application/json'-d'

{

"user": "kimchy"

"post_date": "2009-11-15T14:12:12"

"message": "trying out Elasticsearch"

}

'

If the index document twitter/_doc/1 already exists, the creation will fail.

The equivalent of the above:

Curl-X PUT "localhost:9200/twitter/_create/1"-H 'Content-Type: application/json'-d'

{

"user": "kimchy"

"post_date": "2009-11-15T14:12:12"

"message": "trying out Elasticsearch"

}

'

Automatic generation of document ID:

If no document ID is specified, a unique ID is automatically generated (indexing the document theory must be newly created, and no other documents will be updated):

Example:

Curl-X POST "localhost:9200/twitter/_doc/"-H 'Content-Type: application/json'-d'

{

"user": "mjj"

"post_date": "2009-11-15T14:12:12"

"message": "test Elasticsearch"

}

'

Return result (part): {"_ index": "twitter", "_ type": "_ doc", "_ id": "olLK42oBqV8-hMggVV3X"

3optimistic concurrency control:

Optimistic concurrency controledit

Index operations can be made conditional and only be performed if the last modification to the document was assigned the sequence number and primary term

Specified by the if_seq_no and if_primary_term parameters. If a mismatch is detected, the operation will result in a VersionConflictException and a status

Code of 409. See Optimistic concurrency control for more details.

When the indexed document is finished, the returned result will contain an ordinal: _ seq_no. The next sequence number will be obtained before indexing the document as its own sequence number, and the sequence number will be obtained again at the end for comparison. If the serial number is not consistent, it means that there are other program requests.

Quoted the document. Then this operation returns error 409.

4. Routing (the physical shard where the document is stored)

By default, shard placement? Or routing? Is controlled by using a hash of the document's id value. For more explicit control, the value fed into the hash

Function used by the router can be directly specified on a per-operation basis using the routing parameter. For example:

POST twitter/_doc?routing=kimchy

{

"user": "kimchy"

"post_date": "2009-11-15T14:12:12"

"message": "trying out Elasticsearch"

}

In the example above, the "_ doc" document is routed to a shard based on the routing parameter provided: "kimchy".

When setting up explicit mapping, the _ routing field can be optionally used to direct the index operation to extract the routing value from the document

Itself. This does come at the (very minimal) cost of an additional document parsing pass. If the _ routing mapping is defined and set to be required, the

Index operation will fail if no routing value is provided or extracted.

By default, the system determines that it is stored in a specific shard by performing hash operations on the document id. However, we can also determine the shard by specifying the routing parameter to make the hash function operate on the parameter values mentioned.

Also, in mapping, you can set the _ routing field to indicate which value in the document to use for hash operations. However, if the indexed document does not contain the fields in the mapping settings, an error will be generated.

5 Wait For Active Shards

By default, primary shard completes the operation after indexing the document.

However, you can adjust through index.write.wait_for_active_shards to ensure that multiple shard have saved the changes, and the default value is 1 (primay shard is also a shard).

If set to 2, it means that after primary shard finishes indexing, one copy of the change has to be copied to another replica shard, and the replica shard needs to wait before it finishes.

If index.write.wait_for_active_shards is set to all, it is all num of shards+1. The index operation requires the addition of a new node to complete.

The number_of_replicas number represents the required replican shards. But active shards contains primary shard.

Example:

For example, suppose we have a cluster of three nodes, A, B, and C and we create an index index with the number of replicas set to 3 (resulting in 4 shard

Copies, one more copy than there are nodes). If we attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is

Available before proceeding. This means that even if B and C went down, and A hosted the primary shard copies, the indexing operation would still proceed

With only one copy of the data. If wait_for_active_shards is set on the request to 3 (and all 3 nodes are up), then the indexing operation will require 3

Active shard copies before proceeding, a requirement which should be met because there are 3 active nodes in the cluster, each one holding a copy of the

Shard. However, if we set wait_for_active_shards to all (or to 4, which is the same), the indexing operation will not proceed as we do not have all 4 copies

Of each shard active in the index. The operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.

6. Noop updates null update

When updating a document using the index API a new version of the document is always created even if the document hasn't changed. If this isn't acceptable

Use the _ update API with detect_noop set to true. This option isn't available on the index API because the index API doesn't fetch the old source and isn'

T able to compare it against the new source.

There isn't a hard and fast rule about when noop updates aren't acceptable. It's a combination of lots of factors like how frequently your data source

Sends updates that are actually noops and how many queries per second Elasticsearch runs on the shard receiving the updates.

When a document is updated through Index API, a version is created regardless of whether the content has actually been changed. If this is unacceptable, use _ update API and set the null operation detection option (detect_noop) to

True. This option does not exist in index API because index API does not get old data to compare with new data.

7 Timeout

The primary shard assigned to perform the index operation might not be available when the index operation is executed. Some reasons for this might be that

The primary shard is currently recovering from a gateway or undergoing relocation. By default, the index operation will wait on the primary shard to become

Available for up to 1 minute before failing and responding with an error. The timeout parameter can be used to explicitly specify how long it waits. Here is

An example of setting it to 5 minutes:

Curl-X PUT "localhost:9200/twitter/_doc/1?timeout=5m"-H 'Content-Type: application/json'-d'

{

"user": "kimchy"

"post_date": "2009-11-15T14:12:12"

"message": "trying out Elasticsearch"

}

'

If there is an exception in primary and the indexing operation cannot be completed, the system will wait. By default, if you wait one minute, the exception will time out and report an error. The timeout is set to 5 minutes above.

8 Versioning

Each indexed document is given a version number. By default, internal versioning is used that starts at 1 and increments with each update, deletes included.

Optionally, the version number can be set to an external value (for example, if maintained in a database). To enable this functionality, version_type should

Be set to external. The value provided must be a numeric, long value greater than or equal to 0, and less than around 9.2e+18.

When using the external version type, the system checks to see if the version number passed to the index request is greater than the version of the currently

Stored document. If true, the document will be indexed and the new version number used. If the value provided is less than or equal to the stored document's

Version number, a version conflict will occur and the index operation will fail. For example:

Curl-X PUT "localhost:9200/twitter/_doc/1?version=2&version_type=external"-H 'Content-Type: application/json'-d'

{

Message: "elasticsearch now has versioning support, double cool!"

}

'

Indexed document version control.

Each indexed document has a version number, which is self-built by ES by default. Starting from 1, update and delete operations increase the version serial number.

The version can also be controlled by an external system and is set to external and a given version value through version_type. If the given version number is greater than the current version number, an error will be reported. The example of manually specifying a version number is shown above.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report