In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/03 Report--
In older versions of ES (for example, version 2.3), after the number of shard of index is set, it cannot be modified unless the data is rebuilt.
Starting with ES6.1, ES support can operate online to expand the number of shard (Note: locking and writing index is also required during operation)
Starting from ES7.0, when split, you no longer need to add the parameter index.number_of_routing_shards
Please refer to the official documents:
Https://www.elastic.co/guide/en/elasticsearch/reference/7.5/indices-split-index.html
Https://www.elastic.co/guide/en/elasticsearch/reference/6.1/indices-split-index.html
The process of split:
1. Create a new destination index with the same definition as the source index, but with more primary shard.
2. Hard link the segment from the source index to the destination index. (if the file system does not support hard links, copying all segment to the new index is a time-consuming process. )
3. After creating low-level files, hash all documents again to delete documents belonging to different shard
4. restore the target index as if it were a closed index that had just been reopened.
Why does ES not support incremental resharding?
From N shards to N + 1 shards. Incremental resharding is indeed a feature supported by many key-value stores. It is not feasible to just add a new shard and push the new data into the new shard: this may be an index bottleneck, and the shard to which the document belongs is determined based on the given _ id, which is necessary for fetch, delete, and update requests and can become complicated. This means that we need to rebalance existing data using other hashing schemes.
The most common way for key-value stores to perform this operation effectively is to use a consistent hash. When the number of fragments increases from N to N + 1, a consistent hash requires only 1 / N of the relocation key. However, the storage unit (fragment) of Elasticsearch is the Lucene index. Because of their search-oriented data structure, which accounts for only a large part of the Lucene index, that is, only 5% of the documents, it is usually much more expensive to delete and index them on another shard than to store key values. As mentioned in the previous section, this cost remains reasonable when increasing the number of shards by increasing the multiplier: this allows Elasticsearch to perform splits locally, which in turn allows splits to be performed at the index level, rather than re-indexing documents that need to be re-indexed, and using hard links for efficient file copying.
For appending data only, you can gain more flexibility by creating a new index and pushing the new data to it, while adding an alias to overwrite the old and new indexes for the read operation. Assume that the old index and the new index have M and N shards respectively, which has no overhead compared to searching for indexes with M + N shards.
The prerequisites for the index to be split:
1. The target index cannot exist.
The source index must have less primary shard than the target index.
The number of primary shard in the target index must be a multiple of the number of primary shard in the source index.
4. The node that handles the split process must have enough free disk space to accommodate the second copy of the existing index.
The following is the specific experimental part:
Tips: the experimental machine is limited, the replica of the index is set to 0, and at least replica > = 1 in production.
# create an index, 2 primary shard, no copy
Curl-s-X PUT "http://1.1.1.1:9200/twitter?pretty"-H 'Content-Type: application/json'-d' {" settings ": {" index.number_of_shards ": 2," index.number_of_replicas ": 0}," aliases ": {" my_search_indices ": {}}'
# write several pieces of test data
Curl-s-X PUT "http://1.1.1.1:9200/my_search_indices/_doc/11?pretty"-H 'Content-Type: application/json'-d' {" id ": 11," name ":" lee " "age": "23"} 'curl-s-X PUT "http://1.1.1.1:9200/my_search_indices/_doc/22?pretty"-H' Content-Type: application/json'-d'{" id ": 22," name ":" amd "," age ":" 22 "}'
# query data
Curl-s-XGET "http://1.1.1.1:9200/my_search_indices/_search" | jq.
# write to the index lock for the following split operation
Curl-s-X PUT "http://1.1.1.1:9200/twitter/_settings?pretty"-H 'Content-Type: application/json'-d' {" settings ": {" index.blocks.write ": true}}'
# write data test to ensure that the lock is valid
Curl-s-X PUT "http://1.1.1.1:9200/twitter/_doc/33?pretty"-H 'Content-Type: application/json'-d'{" id ": 33," name ":" amd "," age ":" 33 "}'
# cancel the alias of twitter index
Curl-s-X POST "http://1.1.1.1:9200/_aliases?pretty"-H 'Content-Type: application/json'-d'{" actions ": [{" remove ": {" index ":" twitter "," alias ":" my_search_indices "}}]}'
# start the operation of split split index. After adjustment, the index name is new_twitter and the number of primary shard is 8.
Curl-s-X POST "http://1.1.1.1:9200/twitter/_split/new_twitter?pretty"-H 'Content-Type: application/json'-d'{" settings ": {" index.number_of_shards ": 8," index.number_of_replicas ": 0}}'
# add alias to the new index
Curl-s-X POST "http://1.1.1.1:9200/_aliases?pretty"-H 'Content-Type: application/json'-d'{" actions ": [{" add ": {" index ":" new_twitter "," alias ":" my_search_indices "}}]}'
Results:
{
"acknowledged": true
"shards_acknowledged": true
"index": "new_twitter"
}
Add:
To see the progress of split, you can use the api of _ cat/recovery, or check it on the cerebro interface.
# View the data of the new index and view it normally
Curl-s-XGET "http://1.1.1.1:9200/my_search_indices/_search" | jq.
# write data tests on the new index, and you can see the failed
Curl-s-X PUT "1.1.1.1:9200/my_search_indices/_doc/33?pretty"-H 'Content-Type: application/json'-d'{"id": 33, "name": "amd", "age": "33"}'
# enable the write function of the index
Curl-s-X PUT "1.1.1.1:9200/my_search_indices/_settings?pretty"-H 'Content-Type: application/json'-d' {"settings": {"index.blocks.write": false}}'
# write data to the new index again, and you can see that the write is successful at this time
Curl-s-X PUT "1.1.1.1:9200/my_search_indices/_doc/33?pretty"-H 'Content-Type: application/json'-d'{"id": 33, "name": "amd", "age": "33"} 'curl-s-X PUT "1.1.1.1:9200/my_search_indices/_doc/44?pretty"-H 'Content-Type: application/json'-d' {"id": 44 "name": "intel", "age": "4"}'
# at this point, the old index is read-only. After we make sure that the new index OK, we can consider closing or deleting the old twitter index.
By posting a screenshot of the index executed in the production environment, you can see that each shard volume of the new index is only half that of the old index, thus sharing the pressure of the index:
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.