The same cluster technology and principle in distributed system must be understood. 07/01 Update SLTechnology News&Howtos

The same cluster technology and principle in distributed system must be understood.

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

Write at the front

In today's era of information explosion, a single computer has been unable to load the growing business development. Although there are powerful supercomputers, this kind of high-end computers are not only expensive, but also inflexible. Ordinary enterprises cannot afford it, and they cannot afford to lose, so combine a group of cheap ordinary computers to work together to provide services like a supercomputer. It becomes an assumption to let nature take its course, but this increases the complexity of the software, which requires the development of software to have the ability to scale out horizontally. for example, Kafka, Elasticsearch, Zookeeper and so on belong to this kind of software, which are inherently "distributed", that is, they can share the pressure of data storage and load by adding machine nodes.

Why do you need clusters?

Computers distributed in different areas communicate with each other through the network and cooperate with each other to provide services as a whole. This is the cluster. If the system we develop has this capability, then theoretically, it has the ability to expand horizontally indefinitely, and the throughput of the system will increase with the increase in the number of machines. In the future, when the system has a high load, it can cope with this situation very well.

Why can't CAP be satisfied at the same time?

From the above analysis, we know that to achieve a cluster, in fact, multiple computers are used to bear and load the system pressure, then multiple computers need to participate in processing data together. In order to ensure availability, a copy of data is generally backed up on each computer, so as long as one node remains synchronized, the data will not be lost, such as kafka partition multiple copies, Elasticsearch copy fragmentation. Because the same data block and its copy are located on unused machines, over time, coupled with unreliable network communication, the data on all machines will inevitably be inconsistent. If an extreme situation occurs at this time, if all the machines are down, how can we ensure that the data is not lost (in fact, there are only two ways)?

1. Ensure availability: select the first machine that returns to normal service (not necessarily have all the data) as a trusted data source to quickly restore the cluster, that is, downtime is better than synchronization.

2. Ensure data consistency: wait for the first machine with all the data to return to normal, and then restore the cluster, that is, synchronization is better than downtime. For example, the unclean leader election mechanism that disables kafka is this strategy.

In fact, when most machines are unavailable, there needs to be a compromise between availability and consistency, so another Base theory that is more in line with distributed systems has been created.

How to solve the distributed storage problem?

When a cluster composed of multiple computers provides services to the outside world, it actually provides the ability to read and write.

Block technology (data block)

In order to write data reasonably and evenly to each machine, to improve the writing ability of the cluster, and to balance the load of read requests to different nodes, improve the reading ability of the cluster. In order to decouple data storage and physical nodes and improve the ability of distributed read-write parallel processing, smart engineers introduce a logical data storage unit, collectively referred to as data blocks, such as Kafka partion and Elasticsearch shard. This virtualization greatly improves the flexibility of cluster read and write.

Note: so, the name is not important, know why it is the most important.

Orchestration node (coordination node)

In fact, when the cluster processes data as a whole, each node may receive read and write requests, but the data is scattered on different nodes, so it is necessary for each node to know clearly the location of any data block in the cluster, and then forward the request to the corresponding node, which is the job of the "coordinating node". For example, the master node of Elasticsearch manages all changes within the cluster, and the main shard manages all changes within the scope of the data block.

Most voting mechanisms (quorum)

Baidu encyclopedia: quorum, translation quorum, refers to when holding a meeting, passing a bill, holding an election or organizing a specialized agency, the necessary number of people stipulated by law is invalid if the quorum is not reached.

Due to the existence of network partition, this mechanism is widely used in distributed systems, such as electing Master; data blocks between cluster nodes, etc.; in distributed storage, it is also known as Quorum read-write mechanism, that is, when writing, ensure that most nodes write successfully (the general practice is to elect a master data block (header) to ensure that it is successful, and then synchronize to redundant replica data blocks) When reading, make sure to read the data of most nodes (the general practice is for the coordinating node to distribute the request to different nodes, and then sort all the retrieved data globally and then return it). Then there must be the latest overlapping data in the middle, so that the latest data can be read.

From the above analysis, it can be concluded that as long as most of the nodes are active and available, the availability of the whole cluster will not be affected; as long as most data blocks are active and available, read and write services can be provided continuously; as long as one data block completes the synchronization state, the data will not be lost. In fact, this is an attempt to deal with the failure of fail/recover mode through a redundancy mechanism. Generally speaking, a single point of failure is tolerated, and at least three nodes need to be deployed. If you tolerate 2-point failures, at least 5 nodes need to be deployed. The more machine nodes, the stronger the partition tolerance. Heh heh, heh heh. So the prerequisite for ensuring the availability of a cluster is that an odd number of nodes and data blocks remain active and available, otherwise master or header cannot be elected.

Most voting mechanisms are also very flexible to use. When a distributed system pursues strong consistency, it is necessary to wait for all data and their copies to be written successfully before a write operation is completed, that is, write all (write all). It is understandable that a transaction guarantee will either write all or none at all. for example, starting from version 0.11.0.0 of kafka, when producer sends messages to multiple topic partion This mechanism is used to ensure that the exactly-once semantics of message delivery is very handsome, and in this case, the latest data can be read from any node, and the read performance is the highest. When the distributed system pursues the ultimate consistency, it only needs to wait for the master data block (leader) to be written successfully, and then the master data block can be synchronized to the replica data block by message reachability.

In order to meet the requirements of data reliability and system throughput in different scenarios, and to maximize data persistence and system availability, many components provide configuration items that allow users to define this most legal quantity. Let's talk about the configuration of some common components:

Elasticsearch

As you can see from the figure above, the whole cluster consists of three nodes running Elasticsearch instances, with two primary shards and two secondary shards for a total of 6 shards. The same shard is automatically placed on different nodes within the Elasticsearch, which is very reasonable and ideal. When we create a new document:

1. The client sends a write request for a new document to Node 1.

2. The node uses the _ id of the document to determine that the document belongs to shard 0. The request is forwarded to Node 3 because the main shard of shard 0 is currently assigned to Node 3.

3. Node 3 executes the request on the main shard. If successful, it forwards the request to Node 1 and Node 2 copy shards in parallel. Once all replica shards are reported success, Node 3 reports success to the orchestrator node, and the orchestrator node reports success to the client.

This is the typical step sequence for Elasticsearch to process write requests, and each business scenario has different requirements for data reliability and system performance, so Elasticsearch provides Consistence configuration items:

1. One: write requests can be processed when the main shard is active and available. The system has the highest throughput, but the data may be lost, which is very suitable for scenarios that do not require high data reliability, such as real-time sequential data processing (logs).

2. All: only when the main shard and all replica shards are active and available can write requests be processed. The system has the lowest throughput, but the data is not lost. It is appropriate to deal with critical business data.

3. Quorum: most of the sharded copies must be active to allow write requests to be processed. Balancing system throughput and data reliability, this configuration is used in general business systems.

Kafka

When writing data to Kafka, producers can customize the level of data reliability by setting ack:

0: do not wait for broker to return a confirmation message.

1: leader is returned after being saved successfully.

-1 (all): all backups are saved and returned successfully.

Note: by default, in order to ensure the maximum availability of partitions, when acks=all, kafka will return a message that was written successfully as long as the replica partition in the ISR collection was successfully written. If we want to really guarantee write all (write all), then we need to change the configuration transaction.state.log.min.isr to specify the minimum ISR collection size for topic, that is, to set the length of the ISR collection equal to the number of partitions of topic.

If all the nodes are down, and there is the guarantee of the Unclean leader election mechanism, it is recommended that you read the design section of kafka's official Guide to deeply understand how kafka adapts most voting mechanisms by introducing ISR collections, so as to better ensure the different semantics of message delivery.

What is a clustered brain fissure?

For distributed systems, the key to automatically dealing with faults is to know the survival status (alive) of nodes accurately. Sometimes, if a node is unavailable, it is not necessarily that it is down, and it is very likely to be a temporary network failure; in this case, if you elect a master node immediately, then when the network communication returns to normal, there will be two master at the same time. This phenomenon is vividly called "cluster brain fissure". Let's go down and think about it first. Oh, I have to get up early tomorrow and get some sleep. Good night, everyone.

Note: when designing a highly available distributed system, the failure situation that needs to be considered is often very complex. Most components only deal with the failure of fail/recover mode, that is, tolerate some nodes to be unavailable, and then wait for recovery. Can not deal with Byzantine failure (Byzantine), that is, the problem of trust between nodes, maybe the blockchain can be solved, we can go down and study more, and then we can discuss together, learn together, and make progress together.

Write at the end

Having shared so much, would you please summarize the advantages and disadvantages of most voting mechanisms? Welcome to leave a message in the comment area. Haha, I really want to go to bed. Good night.

Original author: no obsession, no success

Original link: https://www.cnblogs.com/justmine/p/9275730.html

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.