How to solve the data loss caused by different failures of the server? 02/15 Update SLTechnology News&Howtos

How to solve the data loss caused by different failures of the server?

2026-02-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

Server data recovery case 1. Server 3 hard drives offline data recovery

The RAID6 for data recovery is a group of 6 750g disks, and two disks are offline, but the maintenance staff still did not replace the disk in this case, so raid crashed directly after the third hard disk went offline. As a result, all data is lost. This server is a WEB server, which runs the MYSQL database and stores a large number of other files at the same time. The administrator immediately seeks the help of the data recovery company after the data is lost, but after the operation of a certain company, the files are still damaged or lost for nearly a month, and the MYSQL database is also seriously damaged. Later, after the introduction of other operation and maintenance personnel, this administrator contacted us.

After understanding the basic situation of the server failure, our engineers first back up the six disk images to our secure storage pool, and no longer do any operation on the original storage, thus ensuring the originality of customer data. Through the analysis of the server backup image, the data recovery engineer found that two disks were offline so early that the latest data was no longer written. This RAID6 uses double parity, the first check is generated by an ordinary XOR operation, and the second check is generated by the Reed-Solomon algorithm, which is quite complex and uses quite wonderful mathematical principles. There are two disks in this RAID6 that no longer write new data, so a second check must be used to recover the data completely, otherwise it will cause the latest data to be lost or corrupted. At present, there is no public data recovery software on the market to solve this problem, although some software has this function, but it is just a decoration, but it can not be used. This is the fundamental reason why other companies have not been able to recover all the data completely.

The server data recovery engineer analyzed some parameters of the original RAID, then used the complete RAID6 recovery software written by ourselves to generate a complete image, and then imported the image back to the storage set up by the customer with the new disk, boot, everything was normal, verified by the administrator, there was no problem with the data, and the server data was restored successfully.

Server data recovery case 2. The failure data of two hard drives of the server were recovered successfully.

The server for data recovery is made of 4 18GB hard drives into RAID 5 disk array, whose array cards are NetRaid; operating system Window 2000 and database Server 2000. When the server is working normally, the red light of a hard disk is flashing, and the machine is still running normally, but it is not long before the system can not function properly, and then it is found that the red light of another hard disk is also flashing.

The server data recovery process is first of all by the data recovery domestic engineers to test the server. Press Ctrl+M to enter the NetRaid hypervisor when self-testing to the array. Check the array information and find that the status of the hard disk is Failed, and use the modified configuration to forcibly set a hard disk to OnLine. Restart the server, which is invalid during the hardware self-test before entering the system and failed to start.

The data recovery engineer starts the server again and presses Ctrl+M to enter the NetRaid hypervisor when self-testing to the array. Select the disk array, manually Fail the hard disk hanging from the original OnLine, and then manually set another Failed hard disk to OnLine, and restart the server to enter the system. After checking that the system and database are running normally, the Array configuration tool manually sets the Failed hard disk to Rebuild,100% and then restarts the server. All the arrays and systems are restored to their original state, and the server data is restored successfully this time.

Server data recovery case 3. Server crash data recovery caused by unknown reasons

The background of this data recovery case is an ordinary server with 20 hard drives. Due to unknown reasons, the upper layer business suddenly crashed. The administrator of the computer room checked the server and found that the main reason for the server crash was that three hard drives on the server were offline. The administrator removes all the hard drives in the server from the slot according to the existing disk order and carries the hard disk to a data recovery center in Beijing for server data recovery operation.

After receiving the customer's hard disk, the server data recovery engineer used the data recovery detection device to detect 20 hard disks and found that all the hard drives could be identified under the data recovery device. this avoids the process of repairing the hardware and the risk of server data recovery caused by the physical damage of the hard disk is too serious to repair, which is a lucky thing, and then mirror all the hard drives in the server. In the process of mirroring, it is found that the mirror image of the three hard disks in the original server is very slow, which has something to do with the reason why the hard disk is offline before. most of the reasons are that there are a large number of bad channels or unstable sectors in these three hard drives, so offline occurs in a normal server environment, which can be identified in professional data recovery equipment. During the mirroring process, the mirroring is very slow. Adjust the mirroring policy to adjust the bad sectors of the hard disk until all the hard drives are mirrored successfully.

After all the hard disks are mirrored successfully, the data recovery engineer continues to use the server data recovery tool to expand all the mirror files for underlying data analysis, and get the disk sequence and check information of the hard disk in the server according to the reverse analysis of the ext3 file system. Finally, the analyzed information is used to reorganize the raid array, and some oracle dmp files are extracted by communicating with users. In the process of dmp recovery, the database reports imp-0008 errors. Through careful analysis of the log files imported into dmp files, it is found that there are problems in the restored dmp files, which leads to dmp import data failure. Immediately re-analyze the raid structure, and further determine the degree of destruction of the ext3 file system, and after several hours of work, restore the dmp file and the dbf original library file, and hand over the recovered dmp file to the user for data import test. The test results show that the data recovery is successful, and then check the recovered dbf original library file. All files can pass the test.

The server data recovery engineer contacted the customer to verify the data recovery result, and after the customer verified that all the data had been successfully restored, a new raid array was built on the server, and all the recovered server data was migrated back to the client's server with the cooperation of the data recovery engineer, and the server data was restored successfully.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.