What is the method of elasticsearch routing a document to a shard? 07/05 Update SLTechnology News&Howtos

What is the method of elasticsearch routing a document to a shard?

2025-07-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly explains the "elasticsearch routing a document to a slicing method is what", the article explains the content is simple and clear, easy to learn and understand, the following please follow the editor's ideas slowly in-depth, together to study and learn "elasticsearch routing a document to a slicing method is what" it!

When a document is indexed, the document is stored in a main shard. How does Elasticsearch know which shard a document should be stored in? When we create a document, how does it decide whether the document should be stored in shard 1 or shard 2?

First of all, it certainly won't be random, otherwise we won't know where to look when we get the document in the future. In fact, this process is based on the following formula:

Shard = hash (routing)% number_of_primary_shards

Routing is a variable value, the default is the document's _ id, or it can be set to a custom value. Routing generates a number through the hash function, which is then divided by number_of_primary_shards (the number of main slices) to get the remainder. The remainder, which ranges from 0 to number_of_primary_shards-1, is where the document we are looking for is located.

All document API (get, index, delete, bulk, update, and mget) accept a routing parameter called routing, through which we can customize the document-to-shard mapping. A custom routing parameter can be used to ensure that all relevant documents-such as all documents belonging to the same user-are stored in the same shard.

A slice is not without a price. Remember:

The underlying layer of a fragment is an Lucene index, which consumes certain file handles, memory, and CPU operation.

Every search request needs to hit every shard in the index, which is fine if each shard is on a different node, but it is a bit bad if multiple shards need to compete to use the same resource on the same node.

The term statistics used to calculate the correlation are based on fragmentation. If there are many shards, only a small amount of data in each one will lead to a very low correlation.

This is an easy question to answer in certain situations, especially in your own scenario:

Create a cluster with a single node based on the hardware you intend to use in the production environment.

Create an index with the same configuration and parser as you intend to use in the production environment, but leave it with only one primary shard and no replica shards.

Index the actual document (or as close as possible).

Run actual queries and aggregations (or as close to reality as possible).

Basically, you need to copy the way the real environment is used and compress it all into a single shard until it "dies." In fact, the definition of hanging is also up to you: some users need all responses to return within 50 milliseconds, while others are happy to wait five seconds.

Once you have defined the capacity of a single shard, it is easy to calculate the number of shards for the entire index. Take the total number of data you need to index plus some of the expected growth, divided by the capacity of a single shard, and the result is the number of primary shards you need.

When the index is written, the replica shard does the same work as the main shard. The new document is first indexed into the main shard and then synchronized to all other replica shards. Increasing the number of copies does not increase the index capacity.

In any case, replica fragmentation can serve read requests, and if your index is also query-biased as usual, you can improve query performance by increasing the number of replicas, but also add additional hardware resources for this.

An index with two primary fragments and one copy can scale out in four nodes

Balance node load by adjusting the number of copies

Thank you for reading, the above is "elasticsearch routing a document to a sharding method is what" the content, after the study of this article, I believe you routing a document to a slicing method is what this problem has a deeper understanding, the specific use of the need for you to practice verification. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.