The concept and working principle of High availability Cluster 04/18 Update SLTechnology News&Howtos

The concept and working principle of High availability Cluster

2025-04-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

In this article, we mainly understand the concept and working principle of high availability cluster, as well as the logical architecture of high availability cluster.

Ll article navigation

What is a high availability cluster?

What are the characteristics of high availability clusters?

Logical Architecture of High availability Cluster

The solution of High availability Cluster

Working Model of High availability Cluster

Ll requirements

Master the basic principles and logical architecture of high availability clusters.

What is a high availability cluster?

The so-called high availability cluster means that when the current server fails, the services, resources and IP in the server can be transferred to another server to meet the persistence of the business. These two or more servers form a high availability cluster of servers.

For clients, the cluster is like a server, because the cluster runs the same service, even if some of the servers are down or unable to communicate, it will not affect the business.

What are the characteristics of high availability clusters?

I. highly available services

The biggest purpose and function of cluster is to achieve high availability of services, and its ultimate goal is to ensure that services will not be unavailable due to line, hardware and software failures.

II. Metrics (service availability)

Measured by system reliability (Availability) and maintainability (maintainabilit)

Calculation method: HA=MTTF (average no failure event) / (MTTF+MTTR (average repair event)) * 100%

99% of the annual service interruption time does not exceed 4 days.

99.9% of the annual service interruption time does not exceed 10 hours.

99.99% of the annual service interruption time does not exceed 1 hour

99.999% of the annual service interruption time does not exceed 6 minutes.

Third, cluster nodes

All hosts in a cluster are called nodes, and a minimum of 2 nodes are required for each HA cluster; normally, the number of nodes is preferably odd. In the production environment, the number of nodes in the HA cluster is at least 3, which can reduce the probability of brain fissure.

IV. Cluster services and resources

Cluster service usually includes multiple resources, which make up some kind of cluster service. For example, mysql highly available services, whose resources include vip, mysqld, shared storage and so on. The management of cluster services is actually the management of resources.

5. Brain fissure, resource contention and resource isolation

Brain fissure: due to some special reason, the cluster is divided into two small clusters, and the two small clusters can not communicate with each other normally, at this time, the phenomenon of Brain Split will occur.

Resource contention: when a cluster is split into two small clusters due to special circumstances, and neither of the two clusters can communicate, it may result in resource contention; if no timely decision is made after the split occurs, it may cause file corruption in the back-end shared storage or even crash the entire file system because two small clusters use one file system at the same time. Obviously, this situation is not allowed to happen.

Resource isolation: mainly to solve the problem of resource contention. Resource isolation is divided into node-level isolation and resource-level isolation. The so-called node-level isolation means that when the cluster is split, that is, after the brain fissure occurs, the resources are isolated through the STONITH mechanism, and the split cluster with insufficient votes is withdrawn from the cluster through the arbitration mechanism. STONITH refers to the ability to restart or shut down exited hosts through hardware devices, or to block outgoing cluster communication and resource communication through switches.

Solutions for resource isolation:

1. Resource contention occurs when the cluster is split into two small clusters, resulting in catastrophic systems in order to avoid contention for the back-end storage system.

With the collapse of the system, the cluster system has introduced a voting mechanism, and only the cluster with more than half of the legitimate votes can survive, otherwise the cluster will be launched.

The system.

2. When the cluster is even, if the cluster is split, both sides may have an equal number of votes; therefore, the cluster system should not be even, if

An even number requires an additional ping node to vote.

3. After a cluster with insufficient votes withdraws from the cluster service, STONITH mechanism is needed to isolate resources in order to ensure that it will not compete for resources.

Therefore, in order to prevent brain fissure, the number of cluster nodes is generally odd, even if the cluster is split, it is impossible to make the number of votes of the two clusters equal.

Logical Architecture of High availability Cluster

The solution of High availability Cluster

1. Based on [CentOS | RHEL] 5:

1. Bring your own: RHCS (cman+rgmanager)

2. Choose a third party: corosync+pacemaker, heartbeat (v1 or v2), keepalived

2. Based on [CentOS | RHEL] 6:

1. RHCS (cman+rgmanager)

2 、 corosync+rgmanager

3 、 cman+pacemaker

4. Before heartbeat v3 + pacemaker:6.4

5. After keepalived:6.4

Working Model of High availability Cluster

Amap P: two nodes, working in the active and standby model

There are N nodes and M services in Nmurm: n > M, the active node is N, and the standby node is Nmurm.

N-N:N nodes, N services

Double master model: both nodes are active

How resources are transferred:

Rgmanager:failover domain (failover domain), priority (priority)

Failover domain: failover domain, which sets the hosts on which a resource can only be transferred

Priority: sets which hosts are preferred to transfer resources in a transfer domain

Pacemaker:

Resource stickiness: if the two nodes have the same orientation position constraints, which node the resource is sticky to is positive, which node it stays in.

Resource constraints (3 types):

Location constraints: which node does the resource prefer?

Inf: infinity

N: tends to run on a node

-n: tend to leave a node

-inf: negative infinity

Permutation constraints: the tendency of resources to run on the same node

Inf: the two will always be together

-inf: the two are never together again

Sequence constraints: resource startup order and shutdown order

Example: how do I get three resources in web service, vip, httpd, and filesystem, to run on the same node?

1. Arrange constraints; explain the three possibilities together inf

2. Resource group (resource group); three resources are defined in a group, and then the group decides to start on a node

3. Define sequence constraints to ensure startup sequence, vip-filesystem-httpd

Symmetry and asymmetry:

Symmetry: all nodes can transfer resources by default.

Asymmetry; some nodes cannot transfer resources.

What to do with the resources running on the current node if the node is no longer a member of the cluster node:

Stoped: stop the service directly

Ignore: ignore what services used to run and what services are still running now.

Freeze: a pre-established connection that is continued and no new requests are received.

Suicide: kill lost the service.

Is it started when a resource has just been configured?

Target-role: the target role, which can be started or not started.

Resource Agent Type (RA):

Heartbeat legacy: traditional typ

The service script under LSB: / etc/rc.d/init.d/

OCF:

STONITH: designed to achieve resource isolation

Resource type:

Primitive, native: the main resource, which can only run on one node.

Group: group resourc

Clone: clone a resource, a resource that all nodes run, starting with the main resource.

Usually STONITH resources, Cluster filesystem, distributed locks

1) the maximum number of runs. Total clone number

2) A maximum of several runs on each node.

Master/slave: master-slave resource content. Only two copies can be cloned. The master can read and write, and the slave cannot do any operation.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.