In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/02 Report--
Symptom: after abnormally deleting the windows2012 R2 failover cluster and reinstalling the system, it was found that the disk became RAW and the formatting could not be completed successfully after being online.
Formatting cannot be completed, delete volume prompt is used
Check the log and find that there are warnings and errors.
Analysis: due to the abnormal deletion of the cluster, these disks are still locked by the PR Key (SCSI-3 Reservation) of the previous cluster, resulting in these disks being occupied and inaccessible and formatted.
Deal with:
Execute the following cleanup reserved disk command for the abnormal disk of one of the nodes, respectively.-Disk is followed by the disk label number in disk Manager, which is used in https://technet.microsoft.com/en-us/library/ee461016.aspx.
Commands for cleaning up cluster information:
Clear-ClusterDiskReservation-Disk
The disk status is normal after execution in turn.
Summary: when reinstalling the cluster, be sure to exit the cluster node and clear the cluster disk in turn, and then delete the cluster to prevent the shared disk from continuing to be locked by the PR Key of the previous cluster.
Knowledge supplement:
SCSI lock is the basic mechanism used by multiple hosts to operate LUN. In a Windows storage environment, SCSI locks are used when multiple Windows hosts need to access a single LUN, such as a Windows Cluster environment.
Principle interpretation:
In a shared storage environment, multiple hosts may access the same storage device at the same time. If multiple hosts write to a Lun at the same time, it is conceivable that the Lun will not know which data is written first and which data is written later. In order to prevent data corruption caused by this situation, the concept of SCSI lock is introduced. The operation of SCSI lock is carried out through SCSI Reservation mechanism. At present, the vast majority of disks support the 'SCSI reservvation command'. If one host transfers a SCSI Reservation command to the disk, the disk is locked for other hosts. If another host sends a read and write request to a locked disk, it will receive an 'reservation conflict' error message. If the host that keeps the SCSI lock crashes, or if another host sends a 'break reservation or reset target command to the disk to release the SCSI lock. Then, the second host needs to resend the SCSI Reservation command to the disk before sending the SCSI O request.
Classification of SCSI locks:
There are two types of SCSI locks: SCSI-2 Reservation and SCSI-3 Reservation. Only one type of SCSI lock can exist on a LUN.
SCSI-2 Reservation only allows the device to be accessed by the Initiator that issues the SCSI lock, that is, the HBA of the host. For example, if the HBA1 on host 1 adds a SCSI-2 lock to the accessed LUN, even the HBA2 of host 1 cannot access the LUN. So SCSI-2 Reservation is also called Single Path Reservation.
SCSI-3 Reservation (Persistent Reservation) uses PR Key to lock the disk. Usually, a Host will have a unique PR Key, and different hosts have different PR Key. Therefore, SCSI-3 Reservation is usually used in multi-channel shared environment. Here SCSI-3 Reservation is also called Persistent Reservation.
SCSI locks in Windows Cluster:
The SCSI-2 reserve/release command is used in the Windows 2003 cluster. As a non-persistent reservation, a node in the cluster holds the lock of the SCSI-2 Reservation and then refreshes it every 3 seconds. If a failover occurs, the switchover node host places the SCSI-2 Reservation on the appropriate disk and maintains the SCSI lock. If the cluster service on all node hosts is turned off, Reservation will not be retained. SCSI-3 persistent reservation mechanism is used in clusters above Windows 2008. If the disks are not removed correctly from the host, the disks (Cluster Disk) used by the cluster will retain these Reservation. The SCSI lock corresponding to the lock will always exist on the corresponding disk, even if the cluster service is turned off or the disk is unmasked to the host. Therefore, there are times when you need to forcibly remove Reservation from the disk. Under what circumstances will the equipment be locked?
A general device will be locked when it is opened. For example, varyonvg, dd, etc., it should be noted that for a command like dd, the device will be locked when it is run, and will be automatically unlocked when it is finished.
Note: varyonvg-c does not lock the device.
In addition, after vg varyon, only varyoffvg or varyonvg-b will unlock vg-related devices. Using the shutdown command directly will not do the varyoffvg action, so it will not be unlocked.
How does the cluster service retain a disk and bring it back online?
The cluster service uses only the SCSI protocol to manage disks on the shared bus.
Note: this does not mean that all disks will be of type SCSI, designated as the SCSI hardware interface, but instead, the storage unit must be able to correctly interpret and handle SCSI protocols and commands.
The following commands are other SCSI protocol features that will be used on disks in a clustered environment.
Hold: issue this command through the host bus adapter of the SCSI device that acquired or retained ownership. Retain all other host bus adapters for the device, but initially retain one of its initiators to reject all commands. Release: the host bus adapter to which you belong issues this command when the disk resource is offline. It releases the SCSI device of another host bus adapter that is reserved. Reset: on the target device, this command breaks the reservation. This command can reset (for the entire bus) or use the storport driver target to reset the bus for a specific device on the bus.
The following procedure describes how the server cluster starts and takes control of the shared disk. This scenario assumes that only one node opens at a time:
When the computer starts, the Cluster disk driver (Clusdisk.sys) reads the following local registry key to obtain a list of shared disk signatures managed by the cluster:
HKEY_LOCAL_MACHINE\ SYSTEM\ CurrentControlSet\ Services\ ClusDisk\ Parameters\ Signatures
Once the list is obtained, the cluster service attempts to scan the shared SCSI bus on all devices for matching disk signatures.
When the cluster disk driver starts on the first node in the cluster, all Lun (LUN: logical unit numbers, unique identifiers on the SCSI bus used to distinguish devices sharing the same bus) signing keys are matched as offline volumes. Note that this is not the same, in order to take the cluster resource offline. The volume is marked offline to prevent multiple nodes from having write permissions to the volume at the same time. If the cluster shares a disk cluster, one of the disks is designated as the cluster service for the quorum disk. When the quorum disk is the first resource online, the cluster service will attempt to form a cluster.
When the cluster service on the Forming node starts, it first attempts to go online to designate the physical device as the quorum disk. The disk arbitration algorithm is executed on the arbitration disk where ownership is obtained. Successful arbitration begins when the cluster service sends requests that are periodically reserved to disk (to retain ownership) to clusdisk. The cluster service sends a request to clusdisk to unblock access to the quorum disk and then mount the volume on the disk. Successfully mount the volume, complete the online process and cluster service, and then continue to use the form in the cluster process. Requests are passed from the cluster disk driver to the Microsoft storage driver stack and finally to the disk that is specific to the HBA driver. It may also be passed to run any multipathing software in the storage stack. For more information about the storage stack and driver model, click the link below:
Later, the storage controller / device driver reported that the device was successfully retained, and the cluster service ensured that the drive could be read and written. Once all disks in these tests are crossed, the disk resources are marked online, and the cluster service continues to bring all other resources online.
Each node in the cluster renews it has any Lun reservation every three seconds. If a node in the cluster loses communication with another network (for example, if there is no communication on the private or public network), the node begins the process known as quorum the ownership of the disk. Nodes that win complete loss of communication between cluster nodes with quorum disk resources will remain valid. The cluster service and any resource, any node, cannot communicate, cannot maintain or obtain ownership of the quorum disk will terminate the bearer of that node and will be moved to another node in the cluster.
The node that currently owns the quorum disk is the protection node. Defender assumes that it resists any cluster node that it cannot communicate with and does not receive shutdown notifications. Defender keeps reserving reserves on LUN every three seconds through SCSI requests for renewal arbitration. All other nodes (nodes that do not have arbitration disks and cannot communicate with nodes that own arbitration resources) will become challenging nodes. When the challenger detects all lost traffic, it immediately requests a bus-wide SCSI reset to interrupt any existing reservations. The SCSI request is reset after 7 seconds, and the challenger tries to keep the quorum disk. If the defender node is online and functioning properly, it will have retained the quorum disk as usual every three seconds. The challenger also detected that it could not reserve arbitration and the cluster service would be terminated. If the defender is not working properly, the challenger can successfully retain the quorum disk. After 10 seconds, the challenger will arbitrate online and take ownership of all resources in the cluster. If the protection node loses ownership of the quorum device, then the cluster service on the protection node terminates immediately.
When the disk resources required by the cluster node are offline, it requests that the SCSI reserved drive be released and then again will be unavailable to the operating system. As long as the cluster's disk resources are offline, volumes in the cluster whose resources point to (disks with matching signatures) will not be able to access the operating system on any cluster node.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.