The number of shard of dynamically expanding index under ES7.5 04/20 Update SLTechnology News&Howtos

The number of shard of dynamically expanding index under ES7.5

2025-04-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

In older versions of ES (for example, version 2.3), after the number of shard of index is set, it cannot be modified unless the data is rebuilt.

Starting with ES6.1, ES support can operate online to expand the number of shard (Note: locking and writing index is also required during operation)

Starting from ES7.0, when split, you no longer need to add the parameter index.number_of_routing_shards

Please refer to the official documents:

Https://www.elastic.co/guide/en/elasticsearch/reference/7.5/indices-split-index.html

Https://www.elastic.co/guide/en/elasticsearch/reference/6.1/indices-split-index.html

The process of split:

1. Create a new destination index with the same definition as the source index, but with more primary shard.

2. Hard link the segment from the source index to the destination index. (if the file system does not support hard links, copying all segment to the new index is a time-consuming process. )

3. After creating low-level files, hash all documents again to delete documents belonging to different shard

4. restore the target index as if it were a closed index that had just been reopened.

Why does ES not support incremental resharding?

From N shards to N + 1 shards. Incremental resharding is indeed a feature supported by many key-value stores. It is not feasible to just add a new shard and push the new data into the new shard: this may be an index bottleneck, and the shard to which the document belongs is determined based on the given _ id, which is necessary for fetch, delete, and update requests and can become complicated. This means that we need to rebalance existing data using other hashing schemes.

The most common way for key-value stores to perform this operation effectively is to use a consistent hash. When the number of fragments increases from N to N + 1, a consistent hash requires only 1 / N of the relocation key. However, the storage unit (fragment) of Elasticsearch is the Lucene index. Because of their search-oriented data structure, which accounts for only a large part of the Lucene index, that is, only 5% of the documents, it is usually much more expensive to delete and index them on another shard than to store key values. As mentioned in the previous section, this cost remains reasonable when increasing the number of shards by increasing the multiplier: this allows Elasticsearch to perform splits locally, which in turn allows splits to be performed at the index level, rather than re-indexing documents that need to be re-indexed, and using hard links for efficient file copying.

For appending data only, you can gain more flexibility by creating a new index and pushing the new data to it, while adding an alias to overwrite the old and new indexes for the read operation. Assume that the old index and the new index have M and N shards respectively, which has no overhead compared to searching for indexes with M + N shards.

The prerequisites for the index to be split:

1. The target index cannot exist.

The source index must have less primary shard than the target index.

The number of primary shard in the target index must be a multiple of the number of primary shard in the source index.

4. The node that handles the split process must have enough free disk space to accommodate the second copy of the existing index.

The following is the specific experimental part:

Tips: the experimental machine is limited, the replica of the index is set to 0, and at least replica > = 1 in production.

# create an index, 2 primary shard, no copy

Curl-s-X PUT "http://1.1.1.1:9200/twitter?pretty"-H 'Content-Type: application/json'-d' {" settings ": {" index.number_of_shards ": 2," index.number_of_replicas ": 0}," aliases ": {" my_search_indices ": {}}'

# write several pieces of test data

Curl-s-X PUT "http://1.1.1.1:9200/my_search_indices/_doc/11?pretty"-H 'Content-Type: application/json'-d' {" id ": 11," name ":" lee " "age": "23"} 'curl-s-X PUT "http://1.1.1.1:9200/my_search_indices/_doc/22?pretty"-H' Content-Type: application/json'-d'{" id ": 22," name ":" amd "," age ":" 22 "}'

# query data

Curl-s-XGET "http://1.1.1.1:9200/my_search_indices/_search" | jq.

# write to the index lock for the following split operation

Curl-s-X PUT "http://1.1.1.1:9200/twitter/_settings?pretty"-H 'Content-Type: application/json'-d' {" settings ": {" index.blocks.write ": true}}'

# write data test to ensure that the lock is valid

Curl-s-X PUT "http://1.1.1.1:9200/twitter/_doc/33?pretty"-H 'Content-Type: application/json'-d'{" id ": 33," name ":" amd "," age ":" 33 "}'

# cancel the alias of twitter index

Curl-s-X POST "http://1.1.1.1:9200/_aliases?pretty"-H 'Content-Type: application/json'-d'{" actions ": [{" remove ": {" index ":" twitter "," alias ":" my_search_indices "}}]}'

# start the operation of split split index. After adjustment, the index name is new_twitter and the number of primary shard is 8.

Curl-s-X POST "http://1.1.1.1:9200/twitter/_split/new_twitter?pretty"-H 'Content-Type: application/json'-d'{" settings ": {" index.number_of_shards ": 8," index.number_of_replicas ": 0}}'

# add alias to the new index

Curl-s-X POST "http://1.1.1.1:9200/_aliases?pretty"-H 'Content-Type: application/json'-d'{" actions ": [{" add ": {" index ":" new_twitter "," alias ":" my_search_indices "}}]}'

Results:

{

"acknowledged": true

"shards_acknowledged": true

"index": "new_twitter"

}

Add:

To see the progress of split, you can use the api of _ cat/recovery, or check it on the cerebro interface.

# View the data of the new index and view it normally

Curl-s-XGET "http://1.1.1.1:9200/my_search_indices/_search" | jq.

# write data tests on the new index, and you can see the failed

Curl-s-X PUT "1.1.1.1:9200/my_search_indices/_doc/33?pretty"-H 'Content-Type: application/json'-d'{"id": 33, "name": "amd", "age": "33"}'

# enable the write function of the index

Curl-s-X PUT "1.1.1.1:9200/my_search_indices/_settings?pretty"-H 'Content-Type: application/json'-d' {"settings": {"index.blocks.write": false}}'

# write data to the new index again, and you can see that the write is successful at this time

Curl-s-X PUT "1.1.1.1:9200/my_search_indices/_doc/33?pretty"-H 'Content-Type: application/json'-d'{"id": 33, "name": "amd", "age": "33"} 'curl-s-X PUT "1.1.1.1:9200/my_search_indices/_doc/44?pretty"-H 'Content-Type: application/json'-d' {"id": 44 "name": "intel", "age": "4"}'

# at this point, the old index is read-only. After we make sure that the new index OK, we can consider closing or deleting the old twitter index.

By posting a screenshot of the index executed in the production environment, you can see that each shard volume of the new index is only half that of the old index, thus sharing the pressure of the index:

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.