In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article introduces the solution of hard disk dropping in HP MSA storage. The content is very detailed. Interested friends can use it for reference. I hope it will be helpful to you.
1. HP MSA storage device information
1. The storage space consists of 8 450GB SAS hard drives.
Two and seven hard drives form a RAID5 array, and one is used as a hot spare.
2. Fault description of HP MSA storage equipment
1. Two hard drives in the RAID5 array are damaged, and only one hot spare is successfully activated, which paralyzes the RAID5 array and makes the upper LUN unable to use normally.
2. Some disks in the RAID array are offline, causing the entire storage to be unavailable. Therefore, it is necessary to do a physical inspection of all the disks first, and then confirm that the hard disk has no physical failure. Then use the bad track detection tool to detect the disk bad path, and find that there is no bad path.
3. HP MSA stores backup data
Considering the security and reducibility of the data, it is necessary to back up all the source data before data recovery, in case the data cannot be recovered again for other reasons. Use the dd command or the winhex tool to mirror all disks into files. After backing up some of the data, please see the following figure:
IV. Fault analysis of HP MSA storage
1. Analyze the cause of the failure
It is inferred that the failure may be caused by unstable read and write of some disks. Because the HP MSA2000 controller has a strict policy of checking disks, once the performance of some disks is unstable, the HP MSA2000 controller thinks it is a bad disk and kicks the disk that is considered to be a bad disk out of the RAID group. Once the disconnected disk in the RAID group reaches the limit of the RAID level, the RAID group will become unavailable, and the upper RAID-based LUN will also become unavailable. At present, the preliminary understanding is that there are 6 LUN based on RAID group, all of which are assigned to HP-Unix minicomputer, the upper layer does LVM logical volume, and the important data are Oracle database and OA server.
2. Analyze the structure of RAID group.
The LUN stored by HP MSA2000 is based on the RAID group, so it is necessary to analyze the information of the underlying RAID group first, and then reconstruct the original RAID group according to the analyzed information. Based on the analysis of each data disk, it is found that the data of disk 4 is different from that of other data disks, and it is preliminarily considered that it may be a hot space disk. Then analyze other data disks, analyze the distribution of Oracle database pages in each disk, and get the important information of RAID group, such as stripe size, disk order and data direction, according to the data distribution.
3. Analyze the sequence of disconnecting reels in RAID group.
According to the RAID information analyzed above, we try to virtualize the original RAID group through the RAID virtual program independently developed by North Asia. However, since a total of two disks have been dropped in the entire RAID group, it is necessary to analyze the order in which the two hard drives are dropped. Careful analysis of the data in each hard disk, it is found that there is a hard disk in the same stripe on the data and other hard drives are obviously different, so the preliminary judgment of this hard disk may be the first offline, through the North Asia independent development of the RAID check program to check this stripe, found that excluding the analysis of the hard disk data is the best, so you can identify the first offline hard disk.
4. Analyze the LUN information in the RAID group
First of all, the allocation of LUN in the RAID group and the block MAP allocated by LUN are analyzed. Since there are six LUN at the bottom, only the block distribution MAP of each LUN needs to be extracted. Then write the corresponding program according to these information, parse all the data MAP of LUN, and then MAP according to the data and export the data of all LUN.
Fifth, HP MSA storage LVM logical volume and VXFS file system repair
1. Parsing LVM logical volumes
After analyzing all the generated LUN, it is found that all the LUN contain the LVM logical volume information of HP-Unix. Trying to parse the LVM information in each LUN, it is found that there are three sets of LVM, in which the 45G LVM is divided into a LV to store the OA server data, and the 190G LVM is divided into a LV to store the temporary backup data. The remaining four LUN make up a 2.1T LVM, which is only divided into a LV, which stores Oracle database files. Write a program to interpret LVM and try to interpret the LV volumes in each set of LVM, but find an error in the interpreter.
2. Repair the LVM logical volume
Carefully analyze the causes of the program error, arrange the location of the error in the debug program of the development engineer, and arrange the senior file system engineer to detect the recovered LUN to detect whether the LVM information will be damaged due to storage paralysis. After careful inspection, it is found that the LVM information is indeed damaged because of storage paralysis. Try to repair the damaged area manually and modify the program synchronously to re-parse the LVM logical volume.
3. Parse the VXFS file system
Set up the HP-Unix environment, map the interpreted LV volumes to HP-Unix, and try the Mount file system. As a result, an error occurred in the Mount file system, and an attempt was made to use the "fsck-F vxfs" command to repair the vxfs file system, but the repair result still could not be mounted. It is suspected that part of the metadata of the underlying vxfs file system may be destroyed and needs to be repaired manually.
4. Repair the VXFS file system
Carefully analyze the parsed LV and verify the integrity of the VXFS file system according to the underlying structure of the file system. The analysis found that there was a problem with the underlying VXFS file system. It turned out that when the storage was paralyzed at that time, the file was performing IO operations in the system, resulting in no update and corruption of some file system metafiles. Manually repair these damaged metafiles to ensure that the VXFS file system can be parsed normally. Once again, mount the repaired LV volume to the HP-Unix machine, and try the Mount file system. The file system did not report an error, and it was mounted successfully.
Check the Oracle database file and start the database
1. Restore Oracle database files
After mount the file system on the HP-Unix machine, back up all user data to the specified disk space. The size of all user data is about 1.2TB. Screenshots of some file directories are as follows:
2. Check whether the Oracle database file is complete.
Use the Oracle database file detection tool "dbv" to check whether each database file is complete and find no errors. Then use the Oracle database testing tool independently developed by North Asia (the inspection is more stringent), and find that some database files and log files are inconsistent, and arrange for senior database engineers to repair such files and verify them again until all file verifications are fully passed.
3. Start the Oracle database
Since the HP-Unix environment we provided does not have this version of Oracle data, we need the user's original environment to attach the recovered Oracle database to the HP-Unix server of the original production environment. Try to start the Oracle database, and the Oracle database starts successfully. Some screenshots are as follows:
7. Verification of HP MSA storage data
With the active cooperation of the user, start the Oracle database, start the OA server, and install the OA client in the local notebook. The latest data records and historical data records are verified by the OA client, and users arrange remote personnel from different departments for remote verification. The final data verification is correct and the data is complete, so the data recovery work is over.
On the HP MSA storage hard disk offline solution is shared here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.