Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Maintenance of disk array and MSCS

2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/02 Report--

This paper describes in detail the daily maintenance methods of using IBM disk array and MSCS, and explains the solutions to the possible problems.

I. maintenance of disk arrays

Basic knowledge

1. Four main states of the array:

. Online (online): the array status of nodes that have control in the Cluster.

. Offline (offline): the array state of nodes that have no control in a Cluster, or have control, but are offline.

. Critical (critical state): in Cluster, the array in this state is not allowed to switch, and the array must be restored on the machine that originally had control, that is, Rebuild or other recovery operations.

. Blocked (blocking state): appears only at the RAID0 level. In Cluster, the array in this state is not allowed to switch or read or write, and the array must be restored on the machine that originally had control.

2. Two main states of the disk:

. Online (online): the hard drive light is green or the LED is off (depending on the array cabinet model). The status of the array is Online.

. Defunct (off-line, invalid): the hard drive light is red. The status of the array is Offline, Critical, or Blocked.

3. After each switch, the disk array will synchronize the data, and the hard disk lamp flashes regularly for about 2 hours (related to the capacity of the array). At the same time, other operations can still be performed, but must not be powered off or hot-swappable, otherwise the array information will be lost.

4. View the Firmware version of the hard disk:

In the physical disk group of ServeRaid Manager, click to view the hard drive, and the Firmware version number of the hard drive will be displayed on the screen.

Note: version requirements are above 1.09 (or S96E).

5. Check the Firmware and Bios versions of the array cards:

In ServeRaid Manager, click the control card you want to view, and the Firmware and Bios version number of the array card will be displayed on the screen.

Note: Firmware version should be more than 3.70. Bios version should be more than 4.0.

Phenomenon observation

1. Check the status light prompt on the front panel of the array cabinet.

Generally, the hard disk in the array cabinet has two indicators, one is the status light (red), and the other is the hard disk read and write indicator (green).

. The irregular flickering green light of several disks indicates that there are currently read and write operations on the disk (the green light is brighter at this time), and the array is in Online state.

. The flashing green light of all disks indicates that the array is operating synchronously (when the green light is dim), and the array is Online.

. The green light of the disk is turned off to indicate that there is currently no operation, and the array is in Online state.

. The red light on a single hard drive indicates that the disk status is DDD (unavailable) or OffLine.

. The alternating regular flashing of green and orange lights on a hard disk indicates that the disk is Rebuild

. When more than two hard drives turn on the red light, it means that the array cabinet is broken, and the Cluster must crash.

2. Check through the ServeRaid Manager management tool

Start ServeRaid Manager on the node that has control.

. The controller and logic disk are in OK state.

. The physical hard disk that makes up the array is in Online state (if there is a Hot space disk, you can see that the Hot space disk status of this machine is Hot Spare, and the Hot space disk status of the other node is Ready)

. If a Hot Spare hard drive exists, it can be found in the Hot Spare menu

. If the status of a physical hard disk is DDD, it is no longer available and needs to be repaired or replaced

. If the status of a physical hard disk is Offline, the disk is offline (not damaged)

. In RAID 1, RAID 1e, RAID 5, and RAID 5e, if a hard disk state is DDD or Offline, the array or logical disk state is Critical, that is, the critical state

. In RAID 0, if a hard disk state is DDD or Offline, the array or logical disk state is Blocked, that is, blocking state. No operation can be performed on the hard disk at this time. After waiting for recovery, the Blocked state is manually set to UnBlocked state.

Description: the disk status in the node array without control is Defunct (Hot Spare disk is normal).

Disk abnormal state handling

The host is required to have control over the disk array.

1. The DDD status of a single disk. Cluster switching is prohibited at this time (the slave can be disabled).

Note: the DDD status does not necessarily indicate the physical failure of the hard disk. According to the usage of the disk, there are the following ways to deal with it:

. When the disk is used as an Array disk, and the node has a Hot Spare disk: when the disk fails, the Hot Spare disk automatically completes the takeover, the array automatically enters the Rebuild state, and the disk state changes to Hot Spare. If there is no automatic Rebuild, you need to perform the Rebuild operation manually. When finished, set the disk to Hot Spare state. If the manual Rebuild operation fails, you can unplug the disk, insert it into the disk cabinet every other minute, and repeat the above operation; if it still fails, there may be a physical failure on the disk.

. When the disk is used as an Array disk, the node does not have a Hot Spare disk. Select the disk and press the right mouse button to perform the Rebuild operation. If the operation fails, you can unplug the disk, insert it into the disk cabinet every other minute, and repeat the above operation. If it still fails, it means that the disk has a physical failure.

. The disk is a Hot Spare disk: select the disk, press the right mouse button, execute Delete Hot Spare to delete the disk from the Hot Spare state, and then reset the disk to Hot Spare (you can also use Replace and Rebuild). If the operation fails, you can unplug the disk, insert it into the disk cabinet every other minute, and repeat the above operation; if it still fails, it means that the disk may have a physical fault.

2. Offline status of a single disk

Manually set to Online; if it is not successful, first shut down the standby (no control), then restart the host, and then reset to Online; if it is not successful, unplug the disk from the disk cabinet, reinsert it into the cabinet every other minute, shut down the standby again (without control), and then restart the host and standby respectively.

The following two situations first turn off the B machine to prevent the system from switching.

3. Offline status of two disks

First, one of the Online and the other do the Rebuild operation, and then restart the host computer.

4. One Offline, one DDD

Set the Offline disk to Online, Rebuild the DDD disk, and restart the host when it is finished.

5. When the status of the hard disk is Defunct, you can restore it by following the steps below.

. Open ServeRaid Manager.

. Select the hard drive of Defunct and press the right button.

. Use Replace And Rebuild to rebuild the hard disk data.

. Follow the on-screen prompts to unplug the hard drive before inserting it.

Disk array exception handling

1. When the array is in Critical, you only need to Rebuild the failed hard disk on the machine that used to have control.

2. When the array is in Blocked, do the following:

. In order to ensure the recovery of the array, first shut down the machine that did not have control.

. Restart the controlled machine, and the system prompts: press F4mai-correct the error; F5-receive the current configuration.

. Press F4 to correct the current error and correct the Blocked state to the Critical state.

. The system automatically Rebuild the hard disk.

The progress of hard disk Rebuild is displayed in the status bar at the bottom of the window in ServeRaid Manager.

2. Maintenance of MSCS:

The maintenance of MSCS is closely related to the maintenance of the array. If the array is working normally, the MSCS is generally normal, but if some services in the Cluster cannot be started or damaged, the MSCS may work abnormally.

The following are daily maintenance instructions:

1. First check the working status of RAID (checked by IBM ServeRaid manager)

2. Use Cluster Administators to check the work of each service, and all resources should be Online.

3. If a service or resource is in Offline state, find out the reason first, and then set it to Online manually.

4. If the disk or disk array is abnormal, it can be handled according to the maintenance of the disk array.

Note: when the array is in the Critical state, the switching operation should be prevented and prohibited (by shutting down the standby machine).

5. If there is an abnormal power outage (all devices are powered off), start the system in the following order when starting:

. Start the array cabinet first.

. After the array cabinet is powered on, the node that belongs to the control state before the power off is started.

. After the full boot, start another node.

There is a primary domain control server in the dual-computer system, so the primary domain control server should be started first.

6. In case of emergency, the order of shutdown is as follows:

. First shut down the node in the standby state

. Then close the node in the controlled state

. Finally, shut down the disk array.

In principle, the array cabinet should not be powered off, especially when reading and writing to the array.

7. Under special circumstances, Cluster may not be able to start. In general, the node may have no control over the disk array. At this time, execute the ipshahto.exe file in the command line mode to forcibly gain control.

This step is recommended under the guidance of a technician.

8. When the hard disk is in Rebuild, switching is not allowed; when synchronizing, try not to switch.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report