Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

A successful and failed experiment on how to use file descriptors to recover under Linux

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

This article will explain in detail how to use file descriptors to recover successful and failed experiments under Linux. The editor thinks it is very practical, so I share it with you as a reference. I hope you can get something after reading this article.

Mistakenly deleting data is a "low-level mistake" often encountered by junior operation and maintenance personnel, and some experienced veterans sometimes "stumble" when they are tired and uncalm. Once the data file is deleted by mistake, it is the first priority to restore the database by the least impact and the quickest means as soon as possible.

There are many ways to recover data, such as hot and cold backup, flashback database and so on. If the data file is deleted directly from the OS level of the operating system, there are some preferred methods that can be used in the Linux/Unix environment. One is the file descriptor (File Description).

1. Talk about File Description

Different operating systems adopt different ways to achieve CPU management, memory management and storage file management.

In Linux and Unix, file descriptors are used for file management. When a process wants to open a file, it calls the kernel function of the operating system, and the kernel returns a file descriptor. Read and write operations on the file are also performed through this descriptor. When the operating system deletes a file, it is necessary to make sure that all file descriptors are released before it is finally deleted.

If our misoperation occurs in a running database system, the file is deleted and invisible on the operating system. However, there are some existing file descriptors in the database Oracle process. By borrowing these file descriptors, it is possible for us to find the file information and recover the data files.

Therefore, once a mistakenly deleted action occurs, avoid three points: calm mind, disconnect the application, and maintain the site does not close the library.

However, in practice, this technology that can recover quickly is not 100% successful. Using file descriptors to recover data requires that the database cannot be shut down, the database process cannot "automatically cull" data files, and so on. This paper introduces the operation method and premise from two experiments.

2. Setting up the experimental environment.

We chose Linux 2.6 kernel and Oracle 11g to experiment. Note: such an experiment must not be carried out in a production environment. When conducting experiments, a complete backup is also required.

[oracle@bspdev] $uname-a

Linux bspdev.localdomain 2.6.18-308.el5 # 1 SMP Tue Feb 21 20:05:41 EST 2012 i686 i686 i386 GNU/Linux

SQL > select * from v$version

BANNER

Oracle Database 11g Enterprise Edition Release 11.2.0.1.0-Production

PL/SQL Release 11.2.0.1.0-Production

CORE11.2.0.1.0Production

TNS for Linux: Version 11.2.0.1.0-Production

NLSRTL Version 11.2.0.1.0-Production

Create dedicated tablespaces, users, and data tables for experimentation.

SQL > create tablespace rmdtest datafile'/ u01 size oradata size 1000m

2 extent management local uniform size 1m

3 segment space management auto

Tablespace created

SQL > create user rmtest identified by rmtest default tablespace rmdtest

User created

SQL > grant resource, connect to rmtest

Grant succeeded

SQL > grant select any dictionary to rmtest

Grant succeeded

SQL > create table rm_tab as select * from dba_objects

Table created

SQL > insert into rm_tab select * from rm_tab

72731 rows inserted

SQL > commit

Commit complete

Datasheet T, in the tablespace file of the experimental environment.

SQL > select tablespace_name, bytes/1024/1024 M from dba_segments where owner='RMTEST' and segment_name='RM_TAB'

TABLESPACE_NAME M

RMDTEST 17

Make sure you have a good backup!

RMAN > list backup

List of Backup Sets

=

BS Key Type LV Size Device Type Elapsed Time Completion Time

135 Full 1.39G DISK 00:03:13 02-FEB-14

BP Key: 135 Status: AVAILABLE Compressed: NO Tag: TAG20140202T012300

(for reasons of space, there are omissions. )

Piece Name:

Piece Name: / u01/flash_recovery_area/WILSON/autobackup/2014_02_02/o1_mf_s_838430779_9gtckx4s_.bkp

SPFILE Included: Modification time: 02-FEB-14

SPFILE db_unique_name: WILSON

Control File Included: Ckp SCN: 5370719 Ckp time: 02-FEB-14

The following two experiments are conducted to simulate the scenario in which the data file is deleted in the running state. Note: due to operating system differences, there is no scenario of "running open files deleted" under Windows. Therefore, in Linux/AIX, erroneous deletion is more likely to occur.

3. An unsuccessful experiment

The first is an unsuccessful experiment. In the operation and maintenance production environment, our principle is always stable. Dangerous and uncertain situations must be avoided. Even if you don't do it, fix it, or optimize it, don't risk the availability of your business system.

In the experimental environment, we can always find a lot of knowledge and phenomena. First, we try to delete the data file and confirm the location of the file.

[oracle@bspdev ~] $cd / u01/oradata/WILSON/datafile/

[oracle@bspdev datafile] $ls-l | grep rmdtest01.dbf

-rw-r- 1 oracle oinstall 1048584192 Feb 2 02:14 rmdtest01.dbf

Delete the file.

[oracle@bspdev datafile] $rm rmdtest01.dbf

[oracle@bspdev datafile] $ls-l

Total 8121892

-rw-r- 1 oracle oinstall 10493952 Feb 2 02:14 mvtbltest01.dbf

(for reasons of space, there are omissions. )

-rw-r--r-- 1 oracle oinstall 10493952 Feb 2 02:14 tts_simple01.dbf

Note: although the data file is deleted from OS at this time, we can still query the data!

SQL > select count (*) from rmtest.rm_tab

COUNT (*)

-

145462

For this phenomenon, the author thinks that there are two possibilities, one is the cached data information in buffer cache, which can support this kind of query action. Another possibility is that due to the existence of file descriptors, the data file is not really deleted, but also exists in the system in some way, supporting dbwr query.

Prove two kinds of conjecture, clean up the buffer cache.

SQL > alter system flush buffer_cache

System altered

SQL > alter system flush shared_pool

System altered

-- you can still query the results.

SQL > select count (*) from rmtest.rm_tab

COUNT (*)

-

145462

However, if the database somehow finds that the file has been deleted, such as the check point action, it will cause a lot of automated actions to occur.

SQL > alter system checkpoint

System altered

-- the information in alert log is as follows:

Sun Feb 02 02:27:51 2014

Beginning global checkpoint up to RBA [0x1ef.2532.10], SCN: 5374358

Errors in file / u01/diag/rdbms/wilson/wilson/trace/wilson_ckpt_4814.trc:

ORA-01171: datafile 11 going offline due to error advancing checkpoint

ORA-01116: error in opening database file 11

ORA-01110: datafile 11:'/ u01Placement Oradata Universe WILSON Universe datafileUniverse rmdtest01.dbf'

ORA-27041: unable to open file

Linux Error: 2: No such file or directory

Additional information: 3

Completed checkpoint up to RBA [0x1ef.2532.10], SCN: 5374358

Sun Feb 02 02:27:52 2014

Checker run found 1 new persistent data failures

Check Point is a major event of Oracle internal control. After check point,Oracle, it is necessary to ensure that the SCN information of all data files and control files is consistent, and that the log files are consistent at the recovery point in time (RBA+SCN). This process forces you to go back and access the data file, and the loss of the file is found.

Since then, an error has occurred in the query.

SQL > select count (*) from rmtest.rm_tab

Select count (*) from rmtest.rm_tab

ORA-00376: cannot read file 11 at this time

ORA-01110: data file 11:'/ u01qqoradata _

Note: if left unchecked, Oracle's automatic incremental checkpoint will have a similar effect. At the same time, periodic Global Check will also find that "files are missing".

At this time, we carry out the repair operation. To recover files using file descriptors, the first step is to find an Oracle background process, most typically dbwr.

[oracle@bspdev datafile] $ps-ef | grep dbw

Oracle 4806 1 0 02:12? 00:00:00 ora_dbw0_wilson

Oracle 9076 4491 0 02:29 pts/0 00:00:00 grep dbw

Use the lsof-p command to find the file descriptor information for the dbwr process.

[root@bspdev datafile] # lsof-p 4806

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME

Oracle 4806 oracle cwd DIR 253,0 4096 10574090 / u01/oracle/dbs

Oracle 4806 oracle rtd DIR 253,0 4096 2 /

Oracle 4806 oracle txt REG 253,0 173515991 10579756 / u01/oracle/bin/oracle

(for reasons of space, there are omissions. )

Racle 4806 oracle 33uW REG 253,0 10493952 2978999 / u01/oradata/WILSON/datafile/mvtbltest01.dbf

Oracle 4806 oracle 34uW REG 253,0 30416896 524875 / u01/oradata/WILSON/datafile/o1_mf_temp_7xt46489_.tmp

Oracle 4806 oracle 35r REG 253,0 1074176 10595009 / u01/oracle/rdbms/mesg/oraus.msb

It includes all the file descriptors opened by dbwr, and we can't see the deleted file rmdtest01.dbf. The file descriptor directory also has no corresponding FD content. Note: due to some circumstances, the Oracle process deletes the file descriptor!

At this point, we check the status of the file.

SQL > select online_status from dba_data_files where file_id=11

ONLINE_STATUS

-

RECOVER

Files have been forcibly removed from the file system by offline. When you think about it, this process is a result of check point.

The end result of Check Point is that all data files are marked with the same SCN record in the file header, before the dirty blocks are written to the data file. If a data file entity no longer exists, this operation must not be completed. Oracle chose one way to force the file to be "deleted". So we saw the following paragraph in the log.

ORA-01171: datafile 11 going offline due to error advancing checkpoint

For deleted files, it is understandable that Oracle turns off the file descriptor.

The failure of this experiment tells us that using file descriptors for data recovery is not 100% effective. If it takes a long time, or if many special operations have been done, this weak file descriptor will disappear!

Let's conduct a successful experiment.

4. A "successful" experiment

With the help of RMAN backup, we returned to the pre-lab state.

[root@bspdevdatafile] # ls

Mvtbltest01.dbf o1_mf_temp_7xt46489_.tmp

O1_mf_example_7xt46m9x_.dbf o1_mf_undotbs1_7xt3yzl5_.dbf.bak

O1_mf_jpatest_87y6v8qc_.dbf o1_mf_undotbs1_92l5b0v4_.dbf

O1_mf_nbscommo_820frtg1_.dbf o1_mf_users_805nxydh_.dbf

O1_mf_nbscommo_820ft5y5_.dbf rmdtest01.dbf

O1_mf_sysaux_7xt3yzkb_.dbf tts_simple01.dbf

O1_mf_system_7xt3yzhj_.dbf

Delete the data file.

[root@bspdev datafile] # rm rmdtest01.dbf

Rm: remove regular file `rmdtest01.dbf'? Y

[root@bspdev datafile] # ls

Mvtbltest01.dbf o1_mf_system_7xt3yzhj_.dbf

O1_mf_example_7xt46m9x_.dbf o1_mf_temp_7xt46489_.tmp

O1_mf_jpatest_87y6v8qc_.dbf o1_mf_undotbs1_7xt3yzl5_.dbf.bak

O1_mf_nbscommo_820frtg1_.dbf o1_mf_undotbs1_92l5b0v4_.dbf

O1_mf_nbscommo_820ft5y5_.dbf o1_mf_users_805nxydh_.dbf

O1_mf_sysaux_7xt3yzkb_.dbf tts_simple01.dbf

After deletion, we can still query it with the help of file descriptors.

SQL > select count (*) from rmtest.rm_tab

COUNT (*)

-

145462

Look for the dbwr process and determine the process number.

[root@bspdev datafile] # ps-ef | grep dbw

Oracle 9405 1 0 02:45? 00:00:00 ora_dbw0_wilson

Root 9716 4466 0 02:56 pts/0 00:00:00 grep dbw

Use lsof-p to determine the file descriptor information.

[root@bspdev datafile] # lsof-p 9405

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME

Oracle 9405 oracle cwd DIR 253,0 4096 10574090 / u01/oracle/dbs

(for reasons of space, there are omissions. )

Oracle 9405 oracle 29uW REG 253,0 209723392 525165 / u01/oradata/WILSON/datafile/o1_mf_nbscommo_820frtg1_.dbf

Oracle 9405 oracle 30uW REG 253,0 104865792 525166 / u01/oradata/WILSON/datafile/o1_mf_nbscommo_820ft5y5_.dbf

Oracle 9405 oracle 31uW REG 253,0 104865792 525484 / u01/oradata/WILSON/datafile/o1_mf_jpatest_87y6v8qc_.dbf

Oracle 9405 oracle 32uW REG 253,0 10493952 525541 / u01/oradata/WILSON/datafile/tts_simple01.dbf

Oracle 9405 oracle 33uW REG 253,0 10493952 2978999 / u01/oradata/WILSON/datafile/mvtbltest01.dbf

Oracle 9405 oracle 34uW REG 253,0 30416896 524875 / u01/oradata/WILSON/datafile/o1_mf_temp_7xt46489_.tmp

Oracle 9405 oracle 35r REG 253,0 1074176 10595009 / u01/oracle/rdbms/mesg/oraus.msb

Oracle 9405 oracle 36uW REG 253 0 1048584192 2979001 / u01/oradata/WILSON/datafile/rmdtest01.dbf (deleted)

Note: lsof is the best tool for describing file descriptors, including the FD column. We found the information line of rmdtest01.dbf from the connection of dbwr. Among them FD=36uw. This 36 represents the name of the connection file.

The process number of the federated dbwr is 9405, and we find all the file characters in the directory / proc/9405/fd.

[root@bspdev datafile] # cd / proc/9405/fd

[root@bspdev fd] # ls

0 10 12 14 16 18 2 21 23 25 27 29 30 32 34 36 5 7 9

1 11 13 15 17 19 20 22 24 26 28 3 31 33 35 4 6 8

The Ls command can also list information.

[root@bspdev fd] # ls-l

Total 0

Lr-x- 1 oracle oinstall 64 Feb 2 02:56 0-> / dev/null

Lmurwx-1 oracle oinstall 64 Feb 2 02:56 1-> / dev/null

Lrwx- 1 oracle oinstall 64 Feb 2 02:56 10-> / u01/oracle/dbs/lkinstwilson (deleted)

Lrwx- 1 oracle oinstall 64 Feb 2 02:56 34-> / u01/oradata/WILSON/datafile/o1_mf_temp_7xt46489_.tmp

Lr-x- 1 oracle oinstall 64 Feb 2 02:56 35-> / u01/oracle/rdbms/mesg/oraus.msb

Lrwx- 1 oracle oinstall 64 Feb 2 02:56 36-> / u01/oradata/WILSON/datafile/rmdtest01.dbf (deleted)

Lmurwx-1 oracle oinstall 64 Feb 2 02:56 9-> / home/oracle/oradiag_oracle/diag/clients/user_oracle/host_1437849207_76/trace/ora_9293_3085993664.trm

The corresponding file for this 36 still exists, that is, the rmdtest01.dbf file that has been deleted. Copy it out.

[root@bspdev fd] # cp 36 / u01/oradata/WILSON/datafile/rmdtest01res.dbf

[root@bspdev fd] #

After the copy comes out, it is best to make some conversion. First offline it, then control the rename action in the file, and finally recover and online operation. The specific process is often the author's special article (http://blog.itpub.net/17203031/viewspace-773628/).

SQL > alter database datafile 11 offline

Database altered

Rename operation.

-- reporting errors

SQL > alter database rename file'/ u01qoradataAccord WILSONUniplex datafileUniverse rmdtest01.dbf' to'/ u01 oradataAccording to WILSONAccording to WILSONAccording to

Alter database rename file'/ u01aporadaya to'/ u01qoradata to'/ u01inoradataUniverse Wildtest01res.dbf'

ORA-01511: error renaming log / data file

ORA-01141: error renaming datafile 11:00-New file not found'/ u01UniUniverse oradata hand WILSONhand datafileUniverse rmdtest01res.dbf'

ORA-01110: data file 11:'/ u01qqoradata _

ORA-27041: unable to open file

Linux Error: 13: Permission denied

Additional information: 9

The reason is that the copy permission is root and the ownership information needs to be modified manually.

[root@bspdev datafile] # ls-l | grep rmd

-rw-r- 1 root root 1048584192 Feb 2 03:01 rmdtest01res.dbf

[root@bspdev datafile] # chown oracle:oinstall rmdtest01res.dbf

[root@bspdev datafile] # ls-l | grep rmd

-rw-r- 1 oracle oinstall 1048584192 Feb 2 03:01 rmdtest01res.dbf

Rename file.

SQL > alter database rename file'/ u01qoradataAccord WILSONUniplex datafileUniverse rmdtest01.dbf' to'/ u01 oradataAccording to WILSONAccording to WILSONAccording to

Database altered

SQL > recover datafile 11

Media recovery complete.

SQL > alter database datafile 11 online

Database altered.

Use the recovery advisor tool in rman to determine whether an error exists.

[oracle@bspdev ~] $rman nocatalog

Recovery Manager: Release 11.2.0.1.0-Production on Sun Feb 2 03:06:43 2014

Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.

RMAN > connect target /

Connected to target database: WILSON (DBID=3906514064)

Using target database control file instead of recovery catalog

RMAN > list failure

No failures found that match specification

The recovery was completed and the experiment was successful.

This is the end of the article on "the successful and failed experiment of how to recover by using file descriptors under Linux". I hope the above content can be helpful to you, so that you can learn more knowledge. if you think the article is good, please share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report