S5020 Optical Fiber Storage FC hard disk failure data recovery successful case method and data recovery process 02/16 Update SLTechnology News&Howtos

S5020 Optical Fiber Storage FC hard disk failure data recovery successful case method and data recovery process

2026-02-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

This case describes in detail the process of server storage database recovery, including RAID reorganization and database data repair and verification.

Background introduction:

S5020 optical fiber storage. A total of 16 FC hard drives are stored, with a capacity of 600g per disk. Storage front panel No. 10 and No. 13 hard drives turn on the fault light, the volume on which the storage maps to the redhat cannot be mounted, and the business crashes.

Get to work:

Connect to the storage through storage manager to view the current storage status, storage reports logical volume status failure, and then check the physical disk status. Disk 6 reports "warning", disk 10 and 13 report "failure". The current stored full log status is backed up through storage manager, and some information about the logical volume structure is obtained by parsing the backed up storage log.

Figure 1:

Paste the label on 16 FC disks and remove them from the storage after registration according to the original slot number. Using the FC disk mirror device "R510+SUN3510" to roughly test the 16 FC disks, it is found that all the 16 disks can be identified normally, and the SMART status of 16 disks is detected respectively. As a result, the SMART status of disk 6 is the same as that reported in IBM storage manager.

In the windows environment, the FC disk identified by the device is marked as offline in the disk manager, thus providing a write protection function for the original disk, and then the software is used to mirror the original disk at the sector level, mirroring all the physical sectors in the original disk to the logical disk and saving it as a file. In the process of mirroring, it is found that the mirror speed of disk 6 is very slow. Combined with the previous problems found in hard disk SMART state detection, disk 6 should have a large number of damaged and unstable sectors, resulting in the general application software can not operate it.

The bad channel hard disk mirror device is used to mirror the No. 6 hard disk, and the speed and stability of the mirror are observed at the same time in the mirroring process. It is found that there are not many bad paths in disk 6, but there are a large number of unstable sectors such as long read response time. So adjust the copy strategy of disk 6, and some parameters such as the number of sectors skipped by bad paths and response waiting time will be modified. Continue mirroring disk 6. At the same time, observe the image of the rest of the disk.

After the mirroring operation, all the disks have been mirrored. Looking at the log, it is found that there are also bad paths on disk 1, which is not reported wrong in storage manager and hard disk SMART status, and there are a large number of irregular bad channels on disks 10 and 13. According to the list of bad channels, the software is used to locate the target image file. It is found that some key source data information of the ext3 file system has been destroyed by bad channels. You can only wait for disk 6 to finish mirroring, then xor the corrupted file system through the same strip and manually repair the corrupted file system according to the file system context.

The bad channel mirroring device reports that disk 6 mirroring is complete, but the previous copy strategy set to maximize effective sectors and protect the head will automatically skip some unstable sectors, so the current mirror is incomplete, so the copy strategy is adjusted to continue mirroring the skipped sectors, and all sectors of disk 6 are mirrored.

The physical sector images of all the hard disks are obtained, and all the image files are expanded by software under the platform. according to our reverse analysis of the ext3 file system and log files, we get the disk order of 16 FC disks in storage, the block size of RAID, the check direction and mode of RAID, etc., so we try to virtual reorganize RAID,RAID by software to further analyze the ext3 file system. By communicating with the user, some dmp files of oracle are extracted and the user tries to recover them.

In the process of dmp recovery, the database reported as an imp-0008 error. Through careful analysis of the log file imported into the dmp file, it was found that there was a problem with the restored dmp file, which caused the dmp to import data failed. Immediately re-analyze the raid structure, and further determine the degree of destruction of the ext3 file system, and after several hours of work, restore the dmp file and the dbf original library file, and hand over the recovered dmp file to the user for data import test. The test results show that the data recovery is successful, and then check the recovered dbf original library file. All files can pass the test.

Database recovery process

1. Copy the database file to the original database server, and the path is / home/oracle/tmp/syntong.

As a backup. An oradata folder is created under the root directory, and the entire backup syntong folder is copied to the oradata directory. Then change the group and permissions of the oradata folder and all its files.

two。 Back up the original database environment, including the related files under the product folder under ORACLE_HOME. Configure snooping to connect to the database using the splplus in the original machine. Try to start the database to the nomount state. After making a basic status query, we know that there is nothing wrong with the environment and parameter files. Try to start the database to the mount state, and there is no problem with the status query. Start the database to the open state. An error occurred:

Figure 2:

3. After further detection and analysis, it is determined that the fault is the inconsistency between the control file and the data file information, which is a kind of common fault caused by power outage or sudden shutdown.

4. The database files are tested one by one, and no physical damage is detected in all data files.

5. In the mount state, the control file is backed up, and alter database backup controlfile to trace as'/ backup/controlfile'; views and modifies the backup control file to obtain the command to rebuild the control file. Copy these commands into a new script file, controlfile.sql.

6. Close the database and delete the 3 control files under / oradata/syntong/. Start the database to the nomount state and execute the controlfile.sql script.

Figure 3:

7. After the reconstruction of the control file is completed, start the database directly and report an error, which needs further processing.

Figure 4:

Then execute the restore command:

Figure 5:

Do the media recovery until the report is returned and the recovery is complete.

8. Try the open database.

SQL > alter database open resetlogs

9. The database started successfully. Add the data files of the original temp tablespace to the corresponding temp tablespace.

10. Do all kinds of routine checks on the database without any errors.

Make an emp backup. The backup of the whole library has been completed and no error has been reported. Connect the application to the database for data verification at the application level.

The data verification is over, the database repair is completed, and the data recovery is successful.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.