Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Data recovery caused by mutual exclusion failure of optical fiber shared storage on SUN platform server

2025-03-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/02 Report--

Server data recovery failure description:

The original design idea of the server is to use the same storage shared by two SPARC SOLARIS systems through the fiber switch as CLUSTER. Under normal circumstances, A server works. When A server fails and downtime, it can be turned off and then turned on B server to take over. However, due to the improper configuration of the server, the two SERVER are not mutually exclusive to the storage.

When the administrator checked the operation and maintenance of the server, he opened the B server and found that the B server was connected to a group of unknown high-capacity disks. Because the B server was not enabled and was in an idle state, the administrator thought that the disk was idle, so a partition of the whole disk was newfs. However, this disk is the shared storage, A server quickly alarm and downtime. The administrator then does the following to the server: restart server A first, but all file systems cannot be mount. Then the fsck is executed, and most of the partition data is repaired successfully, only the file system that has done newfs on the B machine is not satisfactory, and there is only one lost+found folder under the root directory, which contains a large number of numbered files. The failed file system stores two sets of ORACLE instances, the original structure is UFS, and about 200,400 data files need to be restored.

Data recovery analysis: there are many cases of sharing conflicts in optical fiber equipment, which is due to the flexibility of optical fiber switching. In this example, it is very bad for A machine and B machine to access the stand-alone file system UFS at the same time. Both SERVER manage the storage in an exclusive way. In fact, the file system normally managed by A machine has already been initialized by B machine, and the data written by A machine from the buffer to the file system will also destroy the result of B machine initialization. The B machine newfs actually acts directly on the original file system, but this example is somewhat different from the simple newfs. Before the A machine goes down, a small part of the data (including metadata) will be written back to the file system. If the structure of newfs is the same as before, the data area will not be destroyed, and if there is a small part of metadata, the possibility of partial data recovery still exists. UFS is a traditional UNIX file system, which is cut by block groups, and each block group is assigned several fixed inode areas. When the file system newfs, if the structure is the same as before, the most important inode area of the file system will be initialized, and the previous inode area cannot be retained. Inode manages the important attributes of all files, so it is very difficult to recover data purely from the point of view of the file system. At the same time, the oracle data file itself will be described by the table name, and the original disk file name can also be inferred backwards.

Data recovery process and results: first of all, make a dd backup of the failed file system. Complete oracle data structure analysis and reorganization are done for the whole image file. Then refer to the structural characteristics of the ufs file system for auxiliary analysis of some files that are too chaotic to be reorganized. Use the recovered data files and control files to restore the database on the oracle platform. All databases are fully restored.

Postscript:

Fsck is a deadly operation, so it's best to make a backup (dd) before fsck. The non-mutual exclusion of optical fiber storage is the cause of many data disasters, and the scheme should be deployed and implemented carefully.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 291

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report