WSFC real scene arbitration processing 04/18 Update SLTechnology News&Howtos

WSFC real scene arbitration processing

2025-04-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

In this article, Lao Wang will explain the presentation of cluster arbitration in the real situation from the perspective of practical application, and how to deal with node downtime with different points. In this and future articles, I will not talk about how to install a cluster. Later, we will mainly focus on in-depth knowledge of the cluster, conceptual understanding, architecture design, migration optimization.

This article is divided into the following scenarios

Scenario 1. For a cluster with two sites and three nodes, suppose there are two nodes in Beijing site and one node in Baoding site, and the cluster adopts the mode of majority nodes. We will test the recurrence in turn, what happens when one node of the cluster breaks down, how to deal with it, what happens when the cluster breaks down two nodes, and how to deal with it.

Scenario 2. For a cluster with two sites and four nodes, suppose there are two nodes at the Beijing site and two nodes at the Baoding site, and the cluster uses disk arbitration. We will test the reproduction in turn, and the cluster will witness what happens when the disk is alive, what happens when one node is broken, what happens to the cluster when two nodes are broken, what happens when the witness disk is not there, what happens when the witness disk is not there. How to deal with it.

Scenario 1 introduction to the environment

Beijing site

Node1

Manage network card 10.0.0.3 gateway 10.0.0.1 DNS10.0.0.2

Storage network card 40.0.0.3 gateway 40.0.0.1

Heartbeat network card 18.0.0.1

Node2

Management network card 10.0.0.4 gateway 10.0.0.1 DNS10.0.0.2

Storage network card 40.0.0.4 gateway 40.0.0.1

Heartbeat network card 18.0.0.2

08DC

Lan1 10.0.0.2 Gateway 10.0.0.1

Gateway server

10.0.0.1

20.0.0.1

30.0.0.1

40.0.0.1

Baoding site

Management network card 20.0.0.5 gateway 20.0.0.1 DNS10.0.0.2

Storage network card 30.0.0.5 gateway 30.0.0.1

Heartbeat network card 18.0.0.3

This design, did not adopt some best practices, Lao Wang just to recreate such a multi-site scene, separate the network between the two sites management network card and storage network card, in the later experiment 08DC will also assume the role of ISCSI Server, strictly speaking, the gateway server and storage should be placed in a relatively stable and secure place to prevent a single point of failure of the cluster due to the gateway or storage

You can see that the two sites above the heartbeat card use the same network end, which is not the case in the actual enterprise environment. Usually, it is to get through a big VLAN to do this, but it should be noted that the cluster node network card must have at least one network card and does not configure the gateway, because the cluster will use heuristic algorithms to find it when it is created. By default, the cluster will find the network card with no gateway configured as the heartbeat network card. If all the network cards are configured with the gateway, you will find that the cluster fails. Therefore, if the heartbeat network card also needs to cross the network segment, you can use route-p on the node to manually add the routing table to solve the problem.

Reference https://blogs.technet.microsoft.com/timmcmic/2009/04/26/windows-2008-multi-subnet-clusters-and-using-static-routes/

In addition, we also need to consider the problem of cross-site DNS caching. Due to the limited environment, Lao Wang only uses one DNS server here. Strictly speaking, each site should have its own DNS server. For example, the cluster online address of the current cluster role testdtc in Beijing site is 10.0.0.9, but Beijing DNS will record this VCO as the address of this 10 network segment. Then it is copied to the DNS server in Baoding at regular intervals, and the replication time is a time difference. In fact, the cross-site failover time also needs to take into account the cache replicated by the DNS server and the client's cache, because before Beijing replicates to Baoding, Baoding will cache down from Beijing or get the address of the testdtc that is the 10 network segment, and the client will return this address when it requests. However, when the cluster application is transferred to Baoding, Baoding is 20 network segment, so it is necessary for CNO to re-register the DNS record of VCO before the cluster resource name can be used online for external use. Usually for this cross-subnet cluster application, we will set and bind multiple IP addresses, and then set the dependency to or, that is, as long as one of the IP can be bound alive to register DNS, the cluster requests DNS to update the address of VCO. At this time, VCO can be online normally, but whether the client can access it or not is not certain, because the client also has dns cache. For the DNS cache and recording life cycle of cross-site cluster VCO, Lao Wang will write a blog with an in-depth introduction to multi-site clustering, in which he will point out some best practices, which will be explained in depth in the network part, which is simply mentioned here.

First of all, we can see that the node servers are currently in a normal state, as well as the planned multi-site cluster network architecture.

We have created a clustered DTC application, and you can see that it is currently running on the node1 node

If you power off the node1 directly, you can see that the cluster DTC has been transferred to node2 to continue to provide services.

Open the node2 server Syslog to see the log that detects that node1 is offline

At this time, although the cluster can still operate, a warning has been given by arbitration, which means to remind you that the arbitration agreement you signed with me was a 3-node cluster with a majority of nodes. Now you have broken one, you can't break it again, and the bad cluster will be shut down. In the real scene, if the three-node cluster breaks down one node, you should immediately repair the node and go online again. Otherwise, another bad cluster will no longer provide services.

At this time, we cut off the power and shut down the node2 directly, and we lost the entire Beijing site. After opening the cluster administrator, we can see that the cluster status has been moved downwards. At this time, the cluster is no longer providing services, and both CNO and VCO will be inaccessible.

Opening the event manager system log shows that the cluster service has been forced to stop because of a violation of the arbitration agreement

For cluster analysis, it is mainly divided into system log, cluster detailed analysis log, cluster verification report, cluster directory log, cluster.log log, in which system log and detailed analysis log are relatively easy to understand. It is suggested that you can start with the log in this aspect. Later, there will be an article devoted to cluster log analysis.

At this time, due to the continuous node collapse, the cluster has only the last node left, and the cluster is no longer providing services, but we can also start the cluster service of the last node by forcing it to start. Let the cluster continue to provide services to the outside world. Although the load that a node can carry is limited, it is better to have access to part of it than nothing at all.

Run the command directly

Net stop clussvc

Net start clussvc / FQ

You can stop the cluster service first, and then start it forcefully

After completing the forced startup, we can see that the cluster is already in use, but the cluster prompts us that it is currently running in the forcequorum state, and the cluster configuration may be lost.

Lao Wang guessed that this is because this is the reason for using the majority node arbitration, or the problem encountered when sharing the witness, because of this arbitration mode, the cluster database only has mutual synchronization between the local nodes, assuming that now only Node3 is forced to start, and the other nodes are not there, we modify the cluster resources, and then under node 3, on other nodes, it will be unable to start due to time zoning and other similar reasons.

Therefore, it is recommended that forced startup can only be used as a last resort, which can bring the cluster back to life for a short time, but after revival, other nodes should be repaired immediately to join the cluster and remove the ForceQuorum mode.

You can see that after forced startup, the cluster DTC service has started normally at the Baoding site, and the IP address corresponding to the cluster name is now the 20 network segment on the other side of Baoding.

If you click on the dependency report of a cluster role, you can see a diagram similar to the following. Understanding dependencies is very important for multi-site cluster applications. AND represents that although associated child resources must be online before the parent resource can be online, OR represents that if one of the child resources in the option is alive, the parent resource can be started. For example, the network name needs to be bound to IP. If 10 can be bound to register, bind 10 network segment, if 10 subnet cannot be bound, then 20 network segment can be bound to register, network name can also be online.

Above, we actually look at the treatment of nodes downtime in the case of three-node majority node arbitration.

Next, let's look at the second scene.

Scenario 2 introduction to the environment

Beijing site

Node1

Manage network card 10.0.0.3 gateway 10.0.0.1 DNS10.0.0.2

Storage network card 40.0.0.3 gateway 40.0.0.1

Heartbeat network card 18.0.0.1

Node2

Management network card 10.0.0.4 gateway 10.0.0.1 DNS10.0.0.2

Storage network card 40.0.0.4 gateway 40.0.0.1

Heartbeat network card 18.0.0.2

08DC

Lan1 10.0.0.2 Gateway 10.0.0.1

Lan2 20.0.0.2 Gateway 10.0.0.1

Lan3 30.0.0.2 Gateway 30.0.0.1

Gateway server

10.0.0.1

20.0.0.1

30.0.0.1

40.0.0.1

Baoding site

Node3

Management network card 20.0.0.5 gateway 20.0.0.1 DNS10.0.0.2

Storage network card 30.0.0.5 gateway 30.0.0.1

Heartbeat network card 18.0.0.3

Node4

Management network card 20.0.0.6 gateway 20.0.0.1 DNS10.0.0.2

Storage network card 30.0.0.6 gateway 30.0.0.1

Heartbeat network card 18.0.0.4

You can see that the cluster has been added to four nodes, and disk witnesses have been configured.

Still deploy a cluster DTC, which is currently hosted on the Beijing node2 node

When we power off the node2 directly, we can see that the DTC cluster application has been automatically transferred to the node1 server of this site.

Next, we power off the Node1 directly, simulating that we have lost the nodes of the entire Beijing site. We can see that because we have used disk witness, we can accept that half of the nodes are broken and the cluster can still work normally, but it reminds us that under the current situation, as long as one more node is broken or the cluster disk is down, it will cause the cluster to shut down, so we should hurry to repair the Beijing site. Get online as soon as possible, and don't let this last too long.

Suppose that if the Beijing site is not rescued and another node is damaged at the Baoding site, the cluster will be moved down and closed, and all cluster services will be inaccessible.

At this time, because the number of cluster downtime nodes has violated the number of nodes in the arbitration agreement, the cluster can only be started by forced startup, but this state should not last too long, or other nodes should be repaired online as soon as possible.

Next, let's simulate the scene of brain fissure, that is, the network partition occurs directly between Beijing and Baoding, and the two sides do things in their own way. in fact, Lao Wang should imitate that the four nodes of the cluster first lose the connection to the arbitration disk, and then become four nodes, and in the case of four votes, there is a partition in the middle, but because Lao Wang AD and ISCSI are simulated together. If you directly shut down this machine, all the nodes will not be able to work properly, so Lao Wang temporarily changed the cluster arbitration model to a majority node architecture with four nodes, a majority node architecture with four votes, and a brain fissure trigger architecture. Wow.

At this point, we simulate the division of cerebral fissure and change node3 and node4 directly to another network.

So it begin, you can see on node2 that only node1 and node2 are alive, node3 node4 is clustered, or cannot be formed.

However, if you visit the cluster name test, you will find that it is still inaccessible, because the cluster has no way to carry out arbitration, and there is no majority on either side, so the whole cluster will not be able to work properly.

At this time, because the domain control and storage are on the Beijing side, the Beijing side can work normally, and the network on the Baoding side has not yet been repaired, so we forcibly start the cluster partition in Beijing and run the forced start command on the node2.

You can see that after you only need to run the forced startup command on node2, the whole cluster becomes available. Node1 and node2 are in the same partition. Node1 feels that node2 has formed an authoritative partition and automatically reintegrates into the cluster. But at this time, because we are a forced startup cluster, the cluster is still in a forced startup state. Under any circumstances, do not leave the cluster in a forced startup state for a long time. We should restore the network as soon as possible.

You can see that the cluster DTC is now working online at the Beijing site.

When the Baoding site network is repaired, the cluster partition on this side senses the authoritative cluster partition in Beijing and automatically merges in, the cluster is now working normally again, and the warning of forced startup status has been eliminated.

To sum up, under the majority node arbitration model, we looked at the change processing of the cluster node in the case of non-stop downtime, and also looked at the change processing of the cluster node in the case of disk in the case of disk witness model, and the change handling of the cluster node in the absence of disk witness and network partition.

To put it simply, in the working state of the cluster, the cluster can operate normally as long as it does not violate the rules of your arbitration agreement with the cluster. when the critical value is reached, the cluster will remind you that the critical value has been reached and one more vote will be down. the cluster is about to shut down, and it is time to repair other nodes of the cluster.

Then suppose that in the case of continuous downtime, there is only one node or only a few nodes left. if you want to bring the cluster back from the dead to provide services, you can use compulsory arbitration. In the case of a small number of nodes, the cluster can also be started.

Compulsory arbitration is mainly used to start up the cluster in the case of a small number of nodes, or to start one end of the cluster to provide services when brain cracks occur in the cluster. Similarly, the time of compulsory arbitration should not be too long. Otherwise, there will be the risk of configuration asynchronism, and it is necessary to repair the node or network failure as soon as possible.

When we execute the compulsory arbitration order, the cluster behind us will actually do two things: establish the party behind the compulsory arbitration as the authoritative party, upgrade the cluster configuration of the compulsory arbitration partition to the highest Paxos mark, similar to the authorization recovery in AD, let the party of the compulsory arbitration be the golden authority, the cluster will operate on the authoritative side, and the cluster configuration of other nodes will be synchronized with the compulsory arbitration, there is no denying the compulsory arbitration. Most of the time, it is very practical.

The above is the compulsory arbitration and arbitration processing that Lao Wang wants to talk to you about. I hope you can get something after seeing it. When the cluster goes down node by node, you have a good idea of how to deal with it.

Add a few points

1. Forced startup is just that we have saved the cluster, but whether we can provide services to the outside world is not necessarily possible, because it is assumed that there are four nodes in the cluster, and the resources are all the same. can the forced start-up node bear the load of four nodes? this is a problem. If the load cannot be supported, some cluster applications will not be able to access it online. We also need to take into account this scenario in which there is only the last node. The best thing is to plan well at the time of design, and the server resources are sufficient, and of course, it is best to plan by planning the priority of cluster applications. Once this happens, those cluster applications have a higher priority, so first let these applications come alive, or set up a cold standby machine, usually do not participate in cluster voting. Once there is only one node left, the cold backup node can be started to carry the load.

two。 In fact, there is a scene that Lao Wang did not write about above. Finally, in the last picture, out of curiosity, I tried the arbitration model of the next three nodes + witness disk on the 2008R2 cluster, and the result was very miserable. We can see that when the cluster broke down one node, it already told me that it had reached the critical value. Witness the disk failure or another node, and the cluster would be shut down. It doesn't have any advantage at all. Because if I have three nodes with a majority of nodes, I only need to think about ensuring that two nodes are available. In that case, I have to consider the availability of one more witness disk, which is of no use in keeping the cluster available. The only thing I can think of is that in this scenario, the problem of time zoning can be avoided. If there is one node and witness disk left at the end, the configuration modification information can be synchronized to the witness disk. Other nodes can also be used normally when they are online. In the 2012 era, this has changed. 3 nodes are equipped with witness disks and can live to the last node without forced startup. In the next article, we will actually test the changes that have taken place in arbitration in the 2012 era!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.