Detailed explanation of WSFC service startup switch 10/25 Update SLTechnology News&Howtos

Detailed explanation of WSFC service startup switch

2025-10-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

Clussvc is the managed system service of Windows failover cluster, which runs on each cluster node and is the main service of WSFC. It is responsible for communication between cluster components, handling failover operations and managing configuration and other tasks. Clussvc also cooperates with other components of the cluster, such as RHS,NetFT components, node hardware, cluster membership, heartbeat detection results, third-party software interference, will determine whether the cluster service of the node can be started normally.

The following figure shows the components included in the clussvc primary service. You can see that almost all the cluster components are parasitic under the clussvc, so the service start and stop of the clussvc is very important. Once the clussvc service stops, the node will not be able to use the cluster function, and if all nodes of the cluster clussvc service is turned off, the cluster will stop the external service.

The following figure shows the position of Clussvc in the cluster architecture. As you can see, clussvc is responsible for receiving operations from API, and then transmitting user requests to other cluster components for work. At the same time, it is also responsible for the cooperative communication of components within the cluster. The detection of RHS and CPrepSrv will also be fed back to Clussvc.

Because clussvc serves as the host service of the cluster, it is special. Different scenarios require different startup switches. The startup command is as follows: net start clussvc / startup parameters.

From 2000 to 2003, the startup parameters of the clussvc cluster service are as follows

Fixquorum: suitable for 2000 and 2003 times, this switch is mainly suitable for scenarios where the cluster service cannot be started because of the arbitration device, such as the mounted cluster witness disk is unstable and often occurs offline. After starting with the fixquorum parameter, you can let the cluster service start, but not the online cluster arbitration device, only the online cluster IP and cluster name, and turn off the log record storage to the arbitration device function. After starting through this mode, the cluster application resources and quorum resources will be offline, and administrators can manually online quorum resources to observe the logs for diagnosis.

This startup parameter is best used in scenarios where there is only one node left in the cluster. After starting with this parameter, the cluster database, cluster IP, and cluster name will be mounted on that node to facilitate troubleshooting. If the other nodes of the cluster are running, it is recommended to stop the cluster service for other nodes before troubleshooting the arbitration equipment problem, in order to prevent snatching the cluster IP, cluster name.

After troubleshooting, restart the node with the normal switch net start clussvc

NoQuorumLogging: after starting with this parameter, the cluster stores the shutdown log records to the quorum device function to diagnose quorum log and checkpoint problems in the quorum device. Typically, this switch is only used on one node and is mainly used in the 2003 era when the quorum device log files or checkpoint files are corrupted and you want to manually replace these files with backup copies.

It should be noted that starting the cluster through this parameter may cause time partition problems, and nodes started through NoQuorumlogging had better not modify the cluster information.

Before 2008, only the "arbitration device" of the cluster kept an up-to-date copy of the cluster database, and each node needed to synchronize with the arbitration disk, which replicated the cluster database to each node. After shutdown and restart, each node must also connect to the arbitration disk to download the cluster database synchronously. If the arbitration disk fails, the cluster will not be able to start, so before 2008, the arbitration disk has become a single point of failure. Since 2008, the cluster has introduced the mechanism of paxos tagging, and each node can keep the latest copy of the cluster database. If there is a problem with the arbitration equipment, the quickest solution is to add a new arbitration disk and replace the old one. The arbitration disk detects that the database paxos tags of the cluster nodes are relatively new and will automatically synchronize the new cluster database.

Debug: in the 2003 era, the cluster log contains only the cluster log after the cluster service startup, not the cluster service startup process, so if you want to troubleshoot the cluster service startup process, you need to use the debug switch to perform the operation, switch to the C:\ System32\ Cluster directory, and use the command prompt to execute clussvc / debug > c:\ clusdebug.txt, you can start the cluster service for the purpose of debug And will capture the entire startup process information to the log, mainly used in the 2003 era, because the cluster startup account or system configuration caused the cluster service can not start the problem. After 2008, the cluster service startup log can be seen in clusterlog

The Debug parameter also supports setting the startup diagnostic log level as follows: clussvc / debug loglevel = 3 > c:\ debug.log

Level 0 does not record

Level 1 logs only errors

Level 2 logs errors and warnings

Level3 logs all events, including events that are not written to the event log

DebugResMon: used to debug resource monitor processes and therefore debug resource dynamic link libraries (DLL) loaded by resource monitor. Developers can use this switch to debug resource monitor processes and their custom resource DLL. Before the resource monitor process starts, the cluster service process waits and waits for the message "wait for the debugger to connect to the resmon process X", where X is the process ID (PID) of the resource monitor process. The cluster service does this to wait for all resource monitor processes created by it. After the user attaches the debugger to the resource monitor process and the resource monitor process starts, the cluster service continues its initialization.

Usage: Clussvc / debug / debugresmon

ResetQuorumLog: for 2000 and 2003 times, if the arbitration log and checkpoint files are not found or corrupted, they can be used to create files based on the information in the% SystemRoot%\ Cluster\ CLUSDB registry hive of the local node, applicable to arbitration disk arbitration logs and scenarios where corruption occurs, even if the arbitration log and checkpoint files are created with the local file registry of each node, and then start the cluster using fixquorum to replace the arbitration disk.

Usage: net start clussvc / resetquorumlog. After executing this command, MSCS automatically detects the corruption of the arbitration log and automatically regenerates the arbitration log using the node local registry configuration unit

Different scenarios for ResetQuorumLog and NoQuorumLogging

NoQuorumLogging: there is an arbitration device backup, temporarily stop writing, and then overwrite the backed up recovery.

ResetQuorumLog: there is no backup of arbitration equipment. The arbitration log is rebuilt using the local registry of each node.

After both parameters start the cluster service, you need to start again normally using net start clussvc.

NoRepEvtLogging: for 2000 and 2003, cluster log was generated in real time by the cluster service in 2000 and 2003, this parameter prevents replication of event logs, and if there are a large number of event log entries, the cluster service replicates these entries and logs them to cluster.log. This may cause cluster.log to wrap quickly. Through this parameter, the cluster log of the node can be prevented from being copied to other nodes, the network communication bandwidth can be reduced, and the logs of the node that are not refreshed to the cluster log can be transferred to the local log.

Net start clussvc / norepevtlogging prevents the event log of this node from being copied to other nodes, but can receive information from other nodes normally.

Clussvc / debug / norepevtlogging > c:\ debugnorep.log transfers the logs of this node whose cluster service has not been refreshed to cluster log to the local log

ForceQuorum: the most widely known cluster startup parameter was introduced in the 2003 era. In the 2003 era, the abbreviation FO,FQ is the forcequorum of the fixquorum,2003 era, which mainly plays the role of allowing a small number of partitions to provide services. There are some differences with the forcequorum in the 2008 era. For example, in the 2003 era, there are 4 nodes on the Beijing site and 3 nodes on the Shanghai site. Zoning occurs between the two sites, but Beijing is known to be unable to provide external services, so it is necessary to force the Shanghai site to start. However, the starting point process needs to specify the Shanghai site node name, such as net start clussvc / forcequorum / forcequorum node5,node6,node7. After using this command starting point, the Shanghai site will restart a cluster with only 567 nodes in the list of available owners. After starting the Shanghai site, you should manually stop the cluster service of each node in Beijing to prevent the seizure of resources, and restart the cluster service normally when the partition is restored.

In the 2008 era, WSFC introduced several new parameters, PreventQuorum,IgnorePersistentState, and the changed ForceQuorum

As Lao Wang mentioned in the previous article, ForceQuorum changed in the 2008 era, introducing the paxos marking mechanism, which, like the previous role, helps cluster nodes to start providing services forcefully in the case of a small number of nodes, but it is much more convenient than the previous version. In 2008, we only need to execute net start clussvc / FQ on one of a few nodes. 2008 starts the FQ abbreviation to refer to forcequorum. The cluster database of the node whose FQ is executed will be promoted to the golden copy, and then all other nodes who want to participate in the cluster must synchronize the cluster database with the FQ node before they can normally participate in the cluster. in the 2008 era, if the minority party carries out the FQ compulsory arbitration, it is necessary to perform PQ to the majority party to prevent the majority square from forming clusters, and the partition will be released later. Most parties will automatically synchronize the latest cluster database with the FQ party

IPS parameters are introduced by 2008R2, ForceQuorum is changed from 2008, PQ is introduced by 2008R2, and 2008 can be obtained through hotfix.

IPS switch is an interesting parameter, similar to Fixquorum in the 2000 era, except that fixquorum only online cluster IP and cluster name, other resources are not online, and IgnorePersistentState parameter can help us online cluster IP, cluster name, quorum disk, all these core group components, but not any cluster-based application, can help administrators restore normal service of the cluster before troubleshooting cluster-based application problems.

Under normal circumstances, when the cluster service starts, the default behavior is to bring all resources online. What the IPS switch does is ignore the current resource PersistentState value and keep everything offline

After using this parameter, the cluster as a whole is online and can provide services. This switch only affects the services and application groups in the cluster.

This parameter requires that the cluster meets the quorum requirement before it can be used

Applicable scenario

1. The resource load of each node of the cluster is full, which leads to the collapse of the cluster node at startup. You can use / IPS mode to start the cluster, first online part of the cluster applications, debug, and then online other cluster applications.

two。 Third-party applications based on the cluster cause the cluster to be pseudo-suspended. Start the cluster normally after starting the cluster with the IPS parameter, check the application problems, and then apply online after repair.

Usage: net start clussvc / ips or net start clussvc / ips / fq. Add fq parameter if the operation does not meet the quorum.

Since 2012, the technology of preventing arbitration has changed. When one party of most nodes detects the existence of compulsory arbitration of a few nodes, it automatically performs the action of blocking arbitration, that is, ensuring that the party of compulsory arbitration is recognized as a cluster and synchronizing to the latest with its cluster database before starting its own cluster service. In the previous 2008 era, if you encountered a scenario of compulsory arbitration, you needed to perform the blocking arbitration manually most of the time. Otherwise, there will be situations such as cluster database overwriting, which will automatically help us do this in 2012.

2016 WSFC began to keep up with the latest technology, storage replica, super fusion technology, can cooperate with Azure to achieve WSFC on Azure,Azure witness, in addition to these also changed the default detection mechanism of the cluster, introduced a new instantaneous disconnection mechanism to prevent cluster application failover caused by instantaneous interruption, can be set in a specified time cluster failure does not need to perform failover If a certain number of instantaneous interruptions occur within a certain period of time, it is determined that the node is quarantined and the time for the node to be in quarantine state is 7200 seconds by default. During this period, the node will not host applications, all virtual machines will be migrated in real time, and other cluster resources will be moved away. If the node recovers ahead of time, you can use net start clussvc / CQ or Start-ClusterNode-CQ to perform ClearQuarantine operation to restore the node manually.

Summarize the remaining clussvc startup parameters in the latest 2016: FQ,PQ,IPS,CQ

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.