In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-10 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/02 Report--
Lao Wang once wrote an article that briefly discussed the failure domain and site awareness functions of WSFC 2016, but with further application and use, Lao Wang found that the concept of site awareness runs through many functions in the WSFC 2016 system, so he decided to write another article, mainly to discuss how to think about site awareness, failure domain, and WSFC 2016 health service functions in ReallyWorld.
# 1. Preliminary discussion on WSFC 2016 Fault Domain
Generally speaking, the concept of failure domain is heard only when we deliver SLA or enjoy SLA. For example, when we purchase a cloud service from a cloud vendor, they guarantee a lot of 9s, but only if we put multiple application virtual machines in the cloud service in different failure domains. For users, the cloud manufacturer will usually tell you to put them in different failure domains. Your virtual machine will be placed in a different rack, will never be maintained together, the chance of failure together is very low, and so on.
As a deliverer, we need to set the failure domain policy in the background. The failure domain is not a fixed technology, it should be a specification. After introducing the fault domain specification, the administrator should know that the machines in all failure domains of a user should not be maintained together at the same time, and the technical level will ensure that different failure domain resources are always placed on different racks or cabinets through the cluster system or VIM system. Implementation to this stage can be regarded as logical definition + physical implementation. As for whether the physical implementation can be achieved or not, it depends on the infrastructure's perception of the fault domain. WSFC2016 supports logical definition of Chassis,rack,site three fault domain levels.
At present, only S2D function can really realize fault domain awareness. Once S2D senses that WSFC is configured with Chassis or rack fault domain level, it will always ensure that multiple copies of extent are scattered on different Chassis or rack.
# 2. Further discussion on site awareness and fault domain
Lao Wang believes that there are seven main functions of site perception.
Failover rules: when site awareness is configured, the application will first attempt to fail over at the nodes at the same site, and anti-correlation and available owner configuration will override site awareness
Drainage maintenance rules: the application will first try to drain water at the nodes of the same site, and anti-correlation and available owner configuration will override site awareness
Site-specific heartbeat: only when site awareness is configured for the cluster can we configure the site heartbeat detection frequency
Site ticket pruning: after the site awareness feature is configured, we can configure the preferred site feature. The node of the selected preferred site will win in 50max 50, and the non-preferred site will automatically remove one vote.
Hierarchical preferred site: you can configure the cluster-level preferred site to achieve non-preferred site ticket pruning, or you can configure the cluster group-level preferred site to achieve multi-master preference.
Storage site affinity: after configuring site awareness, the virtual machine looks for the site where CSV is located by default. Site awareness logic believes that the virtual machine and CSV are at the same site to improve efficiency. By configuring the storage preferred site, you can always ensure that the virtual machine and CSV are located at the same site. If the virtual machine finds that the current site is not the same as CSV, it will be moved to
Extended cluster configuration: when we configure an extended cluster, there are actually two extended clusters, one is the application above the cluster, and the other is the storage that is replicated and automatically failed over, although the storage can automatically fail over across sites. however, the extended cluster storage replication does not consider the multi-site problem, it does not understand, it only knows how to copy the disk contents to the specified nodes and interact with the cluster. But we need to consider the problem of multi-site failover for the application. By default, the application will be transferred to all available nodes, and the virtual opportunity may be transferred to the remote site, but in fact, at this time, the storage is still provided by the primary site. At this time, the access efficiency of the primary site will be reduced, so the best practice of extending the cluster is to cooperate with site awareness to achieve the underlying storage failover and the best availability of the application. The implementation application fails over locally by default, and is always stored locally by default
When we think about a cross-site cluster architecture, in addition to network, storage, and arbitration, another point we need to consider is the placement strategy of the cluster. in many cases, if you ignore the cluster placement policy, it will lead to additional downtime, and if you make good use of the cluster placement strategy, you can solve many complex problems.
Site awareness, to put it bluntly, Lao Wang believes that it and S2D fault domain awareness are two different things. Site fault awareness is to define the site architecture in the cluster, so that failover, drainage, heartbeat, arbitration can be performed with an additional reference project, the multi-site architecture in our heads is displayed through software definition, and the cluster components refer to it to work.
Site awareness definition is a method implemented by WSFC2016, some of which we can also implement in previous versions. For example, the application fails over at the local site first. In the past, we defined the preferred owner and the site votes were built. In the past, we defined the new method of LowerQuorumPriorityNodeID,WSFC 2016 site fault awareness. Unlike in the past, we strung together the different functions in these clusters through a site awareness function. This is its powerful point, while site-aware support through PS batch configuration, easier to manage than the old solution before 2016, in short, we need to slowly accept this concept and try to apply it to make the multi-site cluster architecture more perfect.
# 3. Finally talking about failure domain and health service
According to the summary, Lao Wang believes that the definition of fault domain in WSFC2016 has three main uses.
1. Work with applications such as S2D to achieve fault domain awareness (I hope there will be more and more fault domain awareness applications like S2D in the future)
two。 Cooperate with WSFC to realize site awareness to control inter-site, failover, drainage maintenance, heartbeat detection, arbitration execution.
3. Cooperate with health service to realize location troubleshooting
When we create fault domains in powershell, they are actually logical definition texts. Without components that can perceive them, such as S2DMagneWSFC, they are just ordinary Text and will not work. Only components that can perceive them can the defined fault domain level be physically realized.
After clarifying this concept, let's take a look at the health service function. Lao Wang omitted it when talking about the WSFC 2016 series and specially made it up.
Basically, we can understand it as a WSFC's own monitoring function, through health services can help us to focus on a cluster application, cluster component performance collection, working status, its different levels of running status for event reporting.
At present, the health service can only for S2D. When we enable S2D in the cluster, the health service feature is enabled by default. The health service will monitor the operation of S2D and collect its performance reports. Unlike ordinary event logs, Lao Wang believes that the logs collected by health services are very friendly and clear to the administrator at a glance.
For example, these
When we need to use health services to monitor S2D, enter the following command
Get-StorageSubSystem * Cluster* | Debug-StorageSubSystem
Parameter field
Severity
Practical description of the problem
Recommend the next step to solve the problem
If its physical location defines the fault domain, it shows the current fault alarm in the rack under that site, the cabinet and the server according to the nesting relationship.
The description of the resource, if there is a defined failure domain, will also be displayed in a nested relationship
Different from ordinary monitoring software, why does Lao Wang say that it is friendly? it is because its error display is very clear. It can directly tell you that the network cable has fallen off, that network card, that server has lost its connection, or that disk has fallen off.
As shown in the figure, there is a critical level of log indicating that hv01 is missing, and the following location and description automatically show their location or address according to the nested relationship.
Get-StorageSubSystem * Cluster* | Debug-StorageSubSystem this command can only be run when S2D is enabled in the cluster
By default, the execution of this command shows logs that affect the overall operation of the S2D cluster, most of which are related to hardware or configuration
It can also be run
Get-Volume-FileSystemLabel | Debug-Volume
Get-FileShare-Name | Debug-FileShare
This returns fault logs that affect only the specified file share or volume level, which are usually related to capacity planning or recovery feature configuration
In addition to monitoring fault logs, another point of health service is performance collection, which collects some useful performance parameters during S2D operation, such as CPU utilization, IOP, capacity.
Execute the command Get-StorageSubSystem * Cluster* | Get-StorageHealthReport to display the overall performance report of S2D
Displays the S2D performance report for the specified second interval
Get-StorageSubSystem Cluster* | Get-StorageHealthReport-Count
Display a performance report for a share or volume in S2D
Get-Volume-FileSystemLabel | Get-StorageHealthReport-Count
Get-StorageNode-Name | Get-StorageHealthReport-Count
Another function, the health service function, can also be used to monitor important jobs being performed during the operation of S2D.
Get-StorageHealthAction
If the current S2D is doing the following, the
A physical disk that is about to fail, lose connectivity, or be unresponsive
The current storage pool is replacing physical disks
Restore complete recovery data
Rebalance the storage pool
Basically, health service mainly implements these three functions at present. Microsoft hopes to help cluster administrators improve the efficiency of monitoring operation and maintenance through the health service function. Using practical monitoring logs and performance indicators, monitoring logs can be integrated with the failure domain function. When an error occurs, it can automatically nest fault domain relationships to help administrators locate the problem. At present, this function is only used in S2D. Lao Wang hopes that more and more cluster functions will support monitoring services in the future.
In this article, Lao Wang mainly discussed the following concepts with you. If you need to implement, please refer to another blog of Lao Wang, WSFC2016 fault domain site awareness.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 204
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.