WSFC backup recovery 02/14 Update SLTechnology News&Howtos

WSFC backup recovery

2026-02-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

Any IT system needs backup and recovery mechanism, and WSFC is no exception. For WSFC, we need to focus on three main areas.

Backup and recovery of Cluster CNO VCO

Cluster data disk, backup and recovery of CSV content

Backup and recovery of Cluster Database

The backup and recovery of the cluster CNO VCO is actually the backup and recovery of the active directory data. under normal circumstances, the active Directory Recycle Bin function is turned on after 2008R2, which can be restored even if the computer is mistakenly deleted, and then the name can be repaired in Cluster Administrator. As mentioned in the previous article, 2012 can be recovered directly through the AD Management Center. And we can avoid it through anti-deletion, backup and other solutions. Once CNO,VCO is deleted by mistake, the cluster cannot be accessed and the cluster authentication cannot be carried out.

Cluster data disk, CSV content, we can do this through DPM or other tools. For example, if we run a lot of virtual machines on the cluster CSV, then we need to back up the virtual machines on the cluster regularly. At this time, if the backup tool supports, we can back up all the contents of the virtual machines above from the CSV level to prevent the loss of cluster shared data.

The backup and recovery of the cluster database, we will mainly deal with this aspect today. As we said before, the cluster database is the core of the cluster configuration operation. It stores all the configuration information of the WSFC cluster, and will real-time synchronize between the various nodes and witness the disk synchronization. In the event of a failover, the node will reply with reference to the cluster database.

The cluster database is stored in the node registry and witness disk. When we back up, we can back up the cluster database by using windows server backup,DPM to back up the system state, and the cluster database is stored in the system state.

There are mainly two kinds of backup and recovery of cluster database, one is authorized mode recovery, the other is unauthorized mode recovery, which is very interesting, just like AD database recovery.

Authorization mode recovery

In what scenario will you use the authorization mode to restore? for example, when the cluster is running most normally, you perform a backup, and suddenly one day because of careless operation, some cluster configurations are broken, and the whole cluster begins to work improperly. At this time, you can use authorized recovery, choose to restore with a node, perform authorized recovery on that node, and stop the cluster service first. Restore the cluster database configuration, and then start the cluster node. Note that, in essence, when performing an authorized restore, the cluster service for all cluster nodes will be stopped! After the authorization recovery, as long as the restored node will start first, because the authorization restore, to roll back the cluster database configuration to the previous paxos tag, after the rollback, you need to promote the cluster database of the authorized recovery node to a golden copy, and then manually start the cluster service of other cluster nodes, the other cluster nodes will synchronize the cluster database configuration from the node with the golden copy, and the cluster will return to normal.

As you can see, the key to the restoration of authorization mode is

1. The machine can be restored online without shutdown, and there is no need to restart after recovery.

2.WSFC and Windows Server backup awareness to perform cluster authorization mode recovery

Basically, the scenario of authorization mode recovery is clear, roll back the cluster configuration and promote the rollback node cluster database to the golden copy.

It should be noted here that because the paxos flag of the cluster database changes in real time, when performing the cluster database authorization recovery, you must not start the cluster services of all cluster nodes together. Once you accidentally change the configuration on other nodes, the cluster authorization restore will fail, because the paxos of the modified node is marked as up-to-date, and the recovery node will still synchronize the cluster database with it.

What about unauthorized recovery?

I'm sure you've guessed that unauthorized recovery is similar to authorized recovery, but there is no process of upgrading the cluster node database to a golden copy.

The biggest difference between unauthorized recovery and authorized recovery is that unauthorized recovery requires a restart of the machine, and the recovery time is longer.

In essence, when we perform an unauthorized restore, it is equivalent to performing a complete bare metal recovery for the node.

Lao Wang believes that unauthorized recovery is mainly applicable to the following two scenarios

When there is a problem with a single cluster node, the blue screen often crashes and is unstable, and now you do not want to continue to use it. You want to reinstall it. At this time, you can directly format the node, insert the system CD, bare metal recovery node, after bare metal recovery, the cluster node database paxos is marked as the old tag, and will not be promoted to the golden copy. The single node restored without authorization will synchronize the database with other cluster nodes with the latest paxos marks.

There is a problem with the whole cluster, there is a problem with the cluster, and none of the nodes can be used, but before there was a bare metal backup, you can directly install a new machine, insert the system CD, and restore the bare metal to the node, so that the cluster can be revived at a single point, and then wait for resources to be ready before joining the cluster.

To sum up.

Authorization recovery is mainly used to restore the cluster configuration and synchronize to all other nodes

Unauthorized recovery is mainly used to restore nodes or clusters can be used normally, and the restored nodes will synchronize the cluster database with other available nodes

At present, from Lao Wang's point of view, only Windows Server backup,DPM can support authorized recovery of Microsoft cluster database, which is mainly Windows Server backup. Windows Server backup can see the process of authorized recovery of cluster.

For unauthorized recovery, nothing more than bare metal backup, bare metal recovery, maybe in addition to Microsoft's Windows Server backup,DPM, some other third-party backup tools can also be used

In addition, Lao Wang suggested that backing up the cluster database configuration and backing up the cluster data should be carried out separately. Backing up the cluster database only backs up the cluster database configuration, and I only restore the contents related to the cluster database during recovery. For the cluster data disk and CSV content, it is recommended to perform a backup separately, not together with the backup and recovery of the cluster database.

Next, we respectively authorized recovery and unauthorized recovery of the actual combat cluster database.

Authorization recovery, scenario introduction

DC01&iscsi

Lan:10.0.0.2 255.0.0.0

Iscsi:30.0.0.2 255.0.0.0

HV01

MGMET:10.0.0.9 255.0.0.0 DNS 10.0.0.2

ISCSI:30.0.0.9 255.0.0.0

CLUS:18.0.0.9 255.0.0.0

HV02

MGMET:10.0.0.10 255.0.0.0 DNS 10.0.0.2

ISCSI:30.0.0.10 255.0.0.0

CLUS:18.0.0.10 255.0.0.0

The current cluster is running normally. The cluster name is fscluster, and the cluster file service applies fileshare.

Currently, the cluster is running normally, and we have already performed a bare metal backup

The procedure for authorization recovery is as follows

Retrieve and confirm backup information

Perform cluster information recovery through the wbadmin command

Wbadmin and wsfc integration to stop all node cluster services

Restore the cluster database to a previous backup

Start the restored node cluster service and promote the cluster database to a golden copy

Manually start other cluster node cluster services

Destroy the cluster and delete the file server content

1. Check cluster node backup records

Wbadmin get versions

Check backup details

Wbadmin get items-version:10/24/2017-02:17

As you can see, although we only backed up the bare metal, windows server backup, aware that we have a cluster, automatically helped us back up the cluster, only saved in the system state in the 2003 era, and became an independent application in 2008!

two。 Perform cluster database authorization recovery online through wbadmin

Wbadmin start recovery-itemtype:app-items:cluster-version:10/24/2017-02:17

As we said.

When you are ready to enter Y, the cluster starts to stop the cluster node cluster service-restore the database-restart the recovery node cluster service

After the restore is complete, the prompt is as follows

As you can see, the backup and restore process will first stop all node cluster services, and then start the cluster services of the restored nodes to promote to golden copies.

Manually start the HV02 node cluster service

Cluster configuration is restored as before, and authorized recovery is complete

View the clusterlog authorization recovery process

Start performing a cluster database restore

Restore the paxos tag and promote the paxos tag to be a golden copy

The cluster recovery process stops the cluster service of all nodes in the cluster, but will automatically start the restored node later and promote the paxos to mark it as optimal. When other nodes rejoin, the contents of the cluster database must be synchronized with the restored node in order to join the cluster normally.

Next, we will perform an unauthorized restore.

The environment is the same as authorized recovery. Here we simulate that the cluster crashes completely and neither node can be used. We create a new HV03 with exactly the same configuration, and then restore the cluster content to this node.

Since we will restore through the network, we need HV03, a new node, to access the backup folder and temporarily set up a DHCP server.

The current HV01 HV02 node is powered off and cannot be powered on again.

Create a new HV03 virtual machine according to the same configuration, insert the 2016 CD, and select repair computer.

Troubleshooting

System image recovery

After entering, if your new machine joins the environment normally, contact DHCP to get the address, where you can access the backup shared folder by entering the network path and credentials.

International practice, next step

If you use a new hardware server, you need to load the driver here

Make a cup of tea and wait.

Restart after recovery and enter the boot interface

The network card of the cluster node has been restored to the state of HV01. Sometimes some network cards are not restored normally. If you find that the network card is not restored normally, you can re-enter it.

The storage is connected normally. In the 2008R2 era, if you perform unauthorized recovery, sometimes you will encounter the situation that the storage needs to be reconnected. This is optimized in 2012 and 2016, and the storage state will remain normal in most cases.

Open Cluster Administrator and find that the cluster has also recovered normally, and only new HV01 nodes are currently available.

You can redo other nodes and add them later.

At this point, we have finished restoring the cluster using the existing bare metal backup in the event of a complete collapse of the cluster.

Except for our form of recovery.

There is also an unauthorized recovery scenario.

That is, the current node is alive, so I can execute the command on the surviving node.

Wbadmin start systemstaterecovery-version:

To restore a single crashed node, this recovery process is only a system state recovery, and does not perform a database copy paxos flag promotion operation. After the restart is completed, the node will synchronize the latest cluster database content with other existing nodes.

The reason why Lao Wang chose to demonstrate the scenario of a complete collapse of the cluster

It is because Lao Wang thinks that unauthorized recovery plays the most important role in this scenario.

If we make efforts to perform unauthorized recovery because of the collapse of a node, I might as well directly build a new node to join the cluster.

Backup recovery for cluster

Lao Wang suggested that it should be done through integration.

Enable the active Directory Recycle Bin function. After the CNO/VCO is mistakenly deleted, the active Directory is restored and the cluster is repaired. If all the information of the cluster is deleted, the CNO/VCO should be restored first.

Perform a bare metal backup of a cluster node for unauthorized recovery of a crashed node / cluster and authorize the recovery of the cluster database configuration.

For the cluster data disk, CSV, select a cluster-aware backup tool for backup and recovery. If all the cluster information is deleted, you should first restore the CNO/VCO, cluster database configuration, and finally restore the cluster data disk.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.