Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the principle of Elasticsearch distributed architecture?

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

Today, I will talk to you about the principle of Elasticsearch distributed architecture, which may not be well understood by many people. in order to make you understand better, the editor has summarized the following for you. I hope you can get something according to this article.

Although Elasticsearch can get better performance from more powerful hardware, vertical scale-up has its limitations. The real expansion should be horizontal, which distributes the load and increases reliability by adding more nodes.

For most databases, scale-out means that your program will make very big changes to take advantage of these newly added devices. By contrast, Elasticsearch is inherently distributed: it knows how to manage nodes to provide high scalability and availability. This means that your program doesn't need to care about this.

How to add Index in distributed 1 based on ES

The basic unit of storing data in es is the index. In order to add the data to the ES, we need to add the index (index)

Here we need to talk about the relationship between index and shard in ES: a shard is a minimum-level "worker unit" that only stores a portion of all the data in the index; all documents are stored in shards and indexed again and again that interact directly with the application.

Let's take hotel search as an example and add all hotel indexes hotel_idx

PUT / hotel_idx {"settings": {"number_of_shards": 3, "number_of_replicas": 1}}

We start three ES nodes, and the current hotel_idx allocates three master shards (primary shard), each with one replica shard (replica shard). 、

1 Node es Client will pick a Node, select NODE1 above, and then become the coordinator node to write data. How can ES know which shard to route a document (a piece of hotel data) to? in fact, it is based on this formula:

Shard=hash (routing)% number_of_primary_shards

Routing is a variable value, which defaults to the _ id of the document, or it can be set to a custom value, in this case, the hotel_id of the hotel. Routing generates a number through the hash function, which is then divided by number_of_primary_shards (the number of main slices) to get the remainder. The remainder, which ranges from 0 to number_of_primary_shards-1, is where the document we are looking for is located.

2. After writing P0, it will be synchronized to his copy R0, and if the synchronization is successful, it will be returned to the orchestration node Node1, and finally Client will be returned.

3 both primary and secondary fragments can be read from the data read by the client

2 how to ensure high availability

If the NODE1-master node goes down, ES will hold a re-election (if you need to consider sharing a distributed election topic later), and if you choose NODE2 as master.

If there is a non-master outage (node2), the master node node1 transfers the R1 copy of the Node3 to the primary shard P1 to receive write operations, and if the NODE2 is restored, the previous P1 becomes the R1 replica.

3 how to expand

ES needs to specify the number of primary shards when creating the index, so the main shard specified cannot be expanded. When the storage capacity exceeds the current ES node, some production practices generally are to re-establish a new index with a little more shard than the current one, and then import the data, but this also has some disadvantages: it will take more time than we can provide.

Our general practice is to pre-allocate in advance, and through prior planning, we can use pre-allocation to avoid this problem completely.

Among them, replica fragmentation can be expanded dynamically, in reading large scenarios, the appropriate expansion of the replica will increase the throughput.

PUT / hotel_idx/_settings {"number_of_replicas": 2} 4 how to estimate the sharding capacity

In fact, this is difficult to explain, because there are so many relevant factors: the hardware you use, the size and complexity of the document, the way the document is indexed, the type of query running, the aggregation performed, your data model, and so on.

Experience in production suggests:

Create a cluster with a single node based on the hardware you intend to use in the production environment.

Create an index with the same configuration and parser as you intend to use in the production environment, but leave it with only one primary shard and no replica shards.

Index the actual document (or as close as possible).

Run actual queries and aggregations (or as close to reality as possible).

Basically, you need to copy the way the real environment is used and compress it all into a single shard until it "dies." In fact, the definition of hanging is also up to you: some users need all responses to return within 50 milliseconds, while others are happy to wait five seconds.

Once you have defined the capacity of a single shard, it is easy to calculate the number of shards for the entire index. Take the total number of data you need to index plus some of the expected growth, divided by the capacity of a single shard, and the result is the number of primary shards you need.

After reading the above, do you have any further understanding of the principle of Elasticsearch distributed architecture? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report