RAID reorganization and Database data repair and Verification 02/10 Update SLTechnology News&Howtos

RAID reorganization and Database data repair and Verification

2026-02-10 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Background introduction:

IBM DS5020 optical fiber storage. A total of 16 FC hard drives are stored, with a capacity of 600g per disk. Storage front panel No. 10 and 13 hard drives turn on the fault light, the volume on which the storage is mapped to the redhat cannot be mounted, and the business crashes.

Get to work:

Connect to the storage through IBM storage manager to view the current storage status, storage reports logical volume status failure, and then check the physical disk status. Disk 6 reports "warning", disk 10 and 13 report "failure". The current stored full log status is backed up through IBM storage manager, and some information about the logical volume structure is obtained by parsing the backed up storage log.

16 FC disks were affixed with labels and removed from the storage after registration according to the original slot number. The 16 FC disks were roughly tested using the FC disk mirror device "DELL R510+SUN3510" of North Asian data recovery. It was found that all the 16 disks could be identified normally, and the SMART status of 16 disks was detected respectively. As a result, the SMART status of disk 6 was the same as that reported in IBM storage manager.

In the windows environment, the FC disk identified by the device is marked as offline in the disk manager, thus providing a write protection function for the original disk. Then the winhex software is used to mirror the original disk at the sector level, and all the physical sectors in the original disk are mirrored to the logical disk under the windows system and saved as a file. In the process of mirroring, it is found that the mirror speed of disk 6 is very slow. Combined with the previous problems found in hard disk SMART status detection, disk 6 should have a large number of damaged and unstable sectors, resulting in the general application software under windows can not operate it.

The professional bad track hard disk mirroring device is used to mirror the No. 6 hard disk, and the speed and stability of the mirror are observed at the same time in the mirroring process. It is found that there are not many bad paths in disk 6, but there are a large number of unstable sectors such as long read response time, so the copy strategy of disk 6 will be adjusted and some parameters such as the number of sectors skipped by bad paths and response waiting time will be modified. Continue mirroring disk 6. At the same time, observe how the rest of the disk uses winhex image in windows environment.

After the mirroring operation, all the disks mirrored using winhex on the windows platform have been mirrored. Looking at the logs generated by winhex, it is found that there are also bad paths in disk 1, which is not reported wrong in IBM storage manager and hard disk SMART status, and there are a large number of irregular bad channels in disks 10 and 13. According to the list of bad channels, use winhex to locate the target image file and find out. Some of the key source data information of the ext3 file system has been destroyed by bad channels, so we can only wait for disk 6 to finish mirroring, xor through the same strip and manually repair the damaged file system according to the context of the file system.

The bad channel mirroring device reports that disk 6 mirroring is complete, but the previous copy strategy set to maximize effective sectors and protect the head will automatically skip some unstable sectors, so the current mirror is incomplete, so the copy strategy is adjusted to continue mirroring the skipped sectors, and all sectors of disk 6 are mirrored.

The physical sector images of all the hard disks are obtained, and all the image files are expanded using winhex under the windows platform. According to our reverse analysis of the ext3 file system and log files, we get the disk order of 16 FC disks in storage, the block size of RAID, the check direction and mode of RAID, and so on. So we try to virtual reorganize the ext3 file system by software after the completion of the ext3 file system. By communicating with the user, some dmp files of oracle are extracted and the user tries to recover them.

In the process of dmp recovery, oracle reported an imp-0008 error. Contact the oracle engineer in North Asia to carefully analyze the log files imported into dmp files and found that there was a problem with the restored dmp files, which led to dmp import data failure. Immediately re-analyze the raid structure, and further determine the degree of destruction of the ext3 file system, and after several hours of work, restore the dmp file and the dbf original library file, and hand over the recovered dmp file to the user for data import test. The test results show that the data recovery is successful, and then check the recovered dbf original library file. All files can pass the test.

The database engineer from North Asia arrived at the site and communicated with the users and decided to use the recovered dbf original library files to ensure that the data could be restored to the best state.

Database recovery process

1. Copy the database file to the original database server, and the path is / home/oracle/tmp/syntong.

As a backup. An oradata folder is created under the root directory, and the entire backup syntong folder is copied to the oradata directory. Then change the group and permissions of the oradata folder and all its files.

two。 Back up the original database environment, including the related files under the product folder under ORACLE_HOME. Configure snooping to connect to the database using the splplus in the original machine. Try to start the database to the nomount state. After making a basic status query, we know that there is nothing wrong with the environment and parameter files. Try to start the database to the mount state, and there is no problem with the status query. Start the database to the open state. An error occurred:

ORA-01122: databasefile 1 failed verification check

ORA-01110: data file1:'/ oradata/syntong/system01.dbf'

ORA-01207: file ismore recent than control file-old control file

3. After further detection and analysis, it is determined that the fault is the inconsistency between the control file and the data file information, which is a kind of common fault caused by power outage or sudden shutdown.

4. The database files are tested one by one, and no physical damage is detected in all data files.

5. In the mount state, the control file is backed up, and alter database backupcontrolfile to trace as'/ backup/controlfile'; views and modifies the backup control file to obtain the command to rebuild the control file. Copy these commands into a new script file, controlfile.sql.

6. Close the database and delete the 3 control files under / oradata/syntong/. Start the database to the nomount state and execute the controlfile.sql script.

SQL > startupnomount

SQL > @ controlfile.sql

7. After the reconstruction of the control file is completed, start the database directly and report an error, which needs further processing.

SQL > alterdatabase open

Alter database open

ERROR at line 1:

ORA-01113: file 1needs media recovery

ORA-01110: data file1:'/ free/oracle/oradata/orcl/system01.dbf'

Then execute the restore command:

Recover databaseusing backup controlfile until cancel

Recovery of OnlineRedo Log: Thread 1 Group 1 Seq 22 Reading mem 0

Mem# 0 errs 0:/free/oracle/oradata/orcl/redo01.log

...

Do the media recovery until the report is returned and the recovery is complete.

8. Try the open database.

SQL > alterdatabase open resetlogs

9. The database started successfully. Add the data files of the original temp tablespace to the corresponding temp tablespace.

10. Do all kinds of routine checks on the database without any errors.

11. Make an emp backup. The backup of the whole library has been completed and no error has been reported. Connect the application to the database for data verification at the application level.

The data verification is over, the database repair is completed, and the data recovery is successful.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.