Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

This method is simple and efficient to solve the problem of EMC storage crash and RAID offline.

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/02 Report--

Fault Description:

Because 2 hard disks in RAID5 array are damaged, and only one hot spare disk is successfully activated at this time, RAID5 array is paralyzed, and the upper LUN cannot be used normally. The entire storage space consists of 12 1TB SATA hard disks, of which 10 hard disks form a RAID5 array, and the remaining two are used as hot spare disks.

Since the first two steps did not detect physical failure or bad track of the disk, it is inferred that the failure may be caused by unstable reading and writing of some disks. Because EMC controllers have a strict policy of checking disks, once some disks become unstable, EMC controllers consider them bad disks and kick them out of RAID groups. Once a RAID group has reached the RAID level's allowable disk drop limit, the RAID group will become unavailable, and the RAID group-based LUNs above will become unavailable. At present, the preliminary understanding is that there is only one LUN based on RAID group, which is allocated to SUN small computer for use, and the upper file system is ZFS.

resolution process

1. Hard disk detection

Because the storage is because some disks are offline, the entire storage is unavailable. Therefore, after receiving the disk, all the disks are physically detected, and no physical failure is found after detection. Then use Bad Track Detection Tools to detect bad tracks on the disk and find that there are no bad tracks.

2. Backup data

Considering the security and recoverability of the data, it is necessary to make a backup of all source data before data recovery, just in case the data cannot be recovered again for other reasons. Use winhex to mirror all disks into files. Since the sector size of the source disk is 520 bytes, you need to use special tools to convert all backup data from 520 to 512 bytes.

3. Analyze RAID group structure

LUNs stored by EMC are RAID group-based, so you need to analyze information about the underlying RAID group and reconstruct the original RAID group based on the analyzed information. Analyzing each data disk, it is found that disk 8 and disk 11 have no data at all. From the management interface, it can be seen that disk 8 and disk 11 belong to Hot Spare, but disk 8 Hot Spare replaces disk 5 bad disk. Therefore, it can be judged that although Hot Spare of Disk 8 is successfully activated, due to RAID level RAID5, a hard disk is missing in the RAID group at this time, so the data is not synchronized to Disk 8. Continue to analyze the other 10 hard drives, analyzing the distribution of data across the hard drives, the size of RAID stripes, and the order of each disk.

4. Analyze RAID group dropped disk

According to the RAID information analyzed above, try to virtualize the original RAID group through the RAID virtualization program independently developed by North Asia. However, since there are two disks dropped in the entire RAID group, it is necessary to analyze the order in which these two disks are dropped. Carefully analyze the data in each hard disk and find that the data of one hard disk on the same stripe is obviously different from that of other hard disks. Therefore, it is preliminarily judged that this hard disk may be the first to drop the line. Through the RAID verification program independently developed by North Asia, it is found that the data obtained by removing the hard disk just analyzed is the best, so it is possible to identify the first hard disk to drop.

5. Analyze LUN information in RAID group

Since LUNs are RAID group-based, RAID groups need to be reorganized based on the information analyzed above. Then analyze the LUN allocation information in RAID groups and the block MAP of LUN allocation. Since there is only one LUN at the bottom, only one LUN information needs to be analyzed. Then use the North Asian raid restore (datahf.net) program based on this information to interpret the LUN's data MAP and export all of the LUN's data.

6. Explain ZFS File System and Repair

Using North Asia data recovery (datahf.net self-developed ZFS file system interpreter for the generated LUN file system interpretation, found that the program in the interpretation of some file system metafiles error. Quickly arrange development engineers to debug the program and analyze the causes of program errors. Then arrange for file system engineers to analyze whether the ZFS file system is not supported by the program due to version reasons. After 7 hours of analysis and debugging, it was found that some metafiles in ZFS file system were damaged due to sudden storage paralysis, which led to the failure of programs to interpret ZFS file system.

The above analysis makes it clear that some file system metafiles are damaged due to storage paralysis of ZFS file system, so these damaged file system metafiles need to be repaired in order to parse ZFS file system normally. Analyzing the damaged metafile, it is found that some file system metafiles are not updated and damaged due to storage paralysis while the ZFS file is undergoing IO operations. Manual repair of these damaged metafiles ensures that ZFS file systems can be parsed properly.

7. Export all data

Use the program to parse the repaired ZFS file system, parse all file nodes and directory structures. Some screenshots of the file directory are as follows:

8. Verify the latest data

Because the data is text type and DCM image, too many environments need to be built. Some data are verified by the user's engineer, and the verification results are no problem and the data are complete. Some of the files are verified as follows:

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report