The connection and difference between Distribution and Cluster 07/09 Update SLTechnology News&Howtos

The connection and difference between Distribution and Cluster

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

To sum up: distribution improves efficiency by shortening the execution time of a single task, while clustering improves efficiency by increasing the number of tasks executed per unit time.

For example, if a task consists of 10 subtasks, and each subtask takes 1 hour to execute separately, it takes 10 hours to execute the changed task on a server.

Distributed scheme: provide 10 servers, each server is only responsible for handling one sub-task, regardless of the dependency between sub-tasks, it only takes one hour to execute this task. (a typical example of this working mode is Hadoop's Map/Reduce distributed computing model.)

Adopt a cluster solution: provide 10 servers, each of which can handle this task independently. Suppose 10 tasks arrive at the same time, 10 servers will work at the same time, 10 tasks will be completed at the same time, so, as a whole, one task will be completed in 1 hour!

The following is an excerpt from an online article:

Cluster concept:

1. Two key characteristics

A cluster is a group of service entities that work together to provide a service platform with more scalability and availability than a single service entity. From the client's point of view, a cluster is like a service entity, but in fact the cluster consists of a set of service entities. Compared with a single service entity, a cluster provides the following two key features:

Scalability-the performance of the cluster is not limited to a single service entity, and new service entities can be dynamically added to the cluster to enhance the performance of the cluster.

High availability-clusters protect clients from easy out of service warnings through redundancy of service entities. In a cluster, the same service can be provided by multiple service entities. If one service entity fails, another service entity takes over the failed service entity. The ability of the cluster to recover from one faulty service entity to another enhances the availability of the application.

two。 Two major abilities

In order to have scalability and high availability, the cluster must have the following two capabilities:

Load balancing-load balancing can distribute tasks more evenly to computing and network resources in a cluster environment.

Error recovery-for some reason, a resource that performs a task fails, and a resource that performs the same task in another service entity then completes the task. This process in which resources in one entity cannot work and resources in another entity continue to complete the task transparently is called error recovery.

Both load balancing and error recovery require that there are resources in each service entity to perform the same task, and for each resource of the same task, the information view (information context) required to perform the task must be the same.

3. Two major technologies

The implementation of a cluster must have the following two major technologies:

Cluster address-the cluster consists of multiple service entities, and the cluster client accesses the cluster address to obtain the functions of each service entity within the cluster. Having a single cluster address (also known as a single image) is a basic feature of a cluster. The settings that maintain cluster addresses are called load balancers. The internal load balancer is responsible for managing the join and exit of each service entity, and the external is responsible for the translation of the cluster address to the internal service entity address. Some load balancers implement real load balancing algorithms, while others only support task conversion. Load balancers that implement only task transformation are suitable for cluster environments that support ACTIVE-STANDBY, where only one service entity works in the cluster, and when the working service entity fails, the load balancer shifts subsequent tasks to another service entity.

Internal communication-in order to work together, achieve load balancing and error recovery, the entities of the cluster must communicate frequently, such as the communication of load balancer to the heartbeat test information of service entities and the context information of task execution between service entities.

Having the same cluster address enables the client to access the computing services provided by the cluster, and the internal address of each service entity is hidden under a cluster address, so that the computing services required by customers can be distributed among the service entities. Internal communication is the basis of the normal operation of the cluster, which makes the cluster have the ability of load balancing and error recovery.

Cluster classification

Linux clusters are mainly divided into three categories (high availability clusters, load balancing clusters, scientific computing clusters).

High availability Cluster (High Availability Cluster)

Load balancing Cluster (Load Balance Cluster)

Scientific Computing Cluster (High Performance Computing Cluster)

The details include:

Linux High Availability High availability Cluster

(ordinary two-node dual-computer hot backup, multi-node HA cluster, RAC, shared, share-nothing cluster, etc.)

Linux Load Balance load balancing cluster

(LVS, etc.)

Linux High Performance Computing High performance Scientific Computing Cluster

(Beowulf class cluster.)

Distributed storage of other class linux clusters

(such as Openmosix, rendering farm, etc.)

Detailed introduction:

1. High availability Cluster (High Availability Cluster)

Common is the 2-node HA cluster, with many popular and unscientific names, such as "dual-computer hot backup", "dual-computer mutual backup", "dual-computer". The problem of high availability cluster is to ensure the ability of users' applications to provide services continuously. (please note that high availability clusters are not used to protect business data, but to protect users' business programs to provide continuous services, minimizing the impact of software / hardware / man-made failures on business.)

two。 Load balancing Cluster (Load Balance Cluster)

Load balancing system: all nodes in the cluster are active and they share the workload of the system. General Web server clusters, database clusters and application server clusters all belong to this type.

Load balancing cluster is generally used for web servers and database servers for corresponding network requests. This cluster can check servers that receive fewer requests and are not busy when they receive requests, and transfer the requests to those servers. From the point of view of checking the status of other servers, load balancing is very close to fault-tolerant clusters, but the difference is that they are more numerous.

3. Scientific Computing Cluster (High Performance Computing Cluster)

High performance Computing (High Perfermance Computing) cluster, referred to as HPC cluster. Such clusters aim to provide powerful computing power that a single computer cannot provide.

Classification of high performance computing

High throughput computing (High-throughput Computing)

There is a class of high-performance computing that can be divided into several parallel subtasks that are not related to each other. Like searching for aliens at home (SETI@HOME-- Search for Extraterrestrial Intelligence at Home) is this type of application. The project uses idle computing resources on Internet to search for aliens. The server of the SETI project sends a set of data and data patterns to the computing nodes participating in the SETI on the Internet. The computing nodes search the given data with the given patterns, and then send the search results to the server. The server is responsible for aggregating the data returned from each computing node into complete data. Because a common feature of this type of application is to search for certain patterns on large amounts of data, this kind of computing is called high throughput computing. The so-called Internet computing falls into this category. According to the Flynn classification, high throughput computing belongs to the category of SIMD (Single Instruction/Multiple Data). Distributed computing (Distributed Computing)

The other kind of computing is just the opposite of high-throughput computing, although they can be divided into several parallel subtasks, but the subtasks are closely related and require a lot of data exchange. According to the classification of Flynn, distributed high performance computing belongs to the category of MIMD (Multiple Instruction/Multiple Data).

4. The connection and difference between distributed (cluster) and cluster:

Distribution refers to the distribution of different businesses in different places.

Cluster refers to the centralization of several servers to achieve the same business.

Every node in the distributed system can be clustered. Clusters are not necessarily distributed.

For example, if Sina has more visitors, it can make a cluster with a response server in front and several servers behind to complete the same business. if there is business access, the response server will see which server is not heavily loaded and will be given to which server to complete. And distributed, in a narrow sense, is similar to the cluster, but its organization is relatively loose, unlike the cluster, there is an organization, a server crashes, other servers can be topped up. Each distributed node completes a different business, and if a node collapses, the service is inaccessible.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.