How to deal with File corruption in Oracle 07/13 Update SLTechnology News&Howtos

How to deal with File corruption in Oracle

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

Today, I would like to talk to you about how to deal with file corruption in Oracle. Many people may not know much about it. In order to make you understand better, the editor has summarized the following for you. I hope you can get something according to this article.

Overview

This article will introduce you to the phenomenon of oracle file corruption and countermeasures. Please note that all recoveries are based on backup, so please turn on the daily backup of the database.

The article will be expanded from the following files:

a. Password file

b. Parameter file

c. Control file

d. Data files (general tablespace data files, other tablespace data files such as system, sysaux, undo)

e. Log files (current, active, inactive)

Before the formal experiment, I would like to ask a question: which of the above documents is the deadliest? You are welcome to leave a message at the end of the article.

Environmental preparation

This experiment is carried out in oracle 11G archive mode. Before the experiment, make a full backup of the database.

Create a regular tablespace and some test tables create tablespace tbs01 datafile'/ u01 image RMAN from dba_objects where rownum backup database; / / full library backup RMAN > list backup / / View backup BS Key Type LV Size Device Type Elapsed Time Completion Time--21 Full 1.14G DISK 00:01:33 17-MAR-20 BP Key : 21 Status: AVAILABLE Compressed: NO Tag: TAG20200317T133425 Piece Name: / home/oracle/backupdir/ORCLTEST_2750922031_133_1_20200317_1035293665.bkp List of Datafiles in backup set 21File LV Type Ckp SCN Ckp Time Name---- 1 Full 1606913 17-MAR-20 / u01/app/oracle/oradata/orcltest/system01.dbf2 Full 1606913 17-MAR-20 / u01/app/oracle/oradata/orcltest/sysaux01.dbf3 Full 1606913 17-MAR-20 / u01/app/oracle/oradata/orcltest/undotbs01.dbf4 Full 1606913 17-MAR-20 / u01/app/oracle/oradata/orcltest/users01.dbf5 Full 1606913 17-MAR-20 / u01/app/oracle/oradata/orcltest/example01.dbf6 Full 1606913 17-MAR-20 / u01/app/oracle/oradata/orcltest/ Tbs01.dbfBS Key Type LV Size Device Type Elapsed Time Completion Time--22 Full 9.73M DISK 00:00:02 17-MAR-20 BP Key: 22 Status: AVAILABLE Compressed: NO Tag: TAG20200317T133602 Piece Name: / home/oracle/backupdir/c-2750922031-20200317-00SPFILE Included: Modification time: 17-MAR-20SPFILE db_unique_name: ORCLTEST Control File Included: Ckp SCN: 1606985 Ckp time: 17-MAR-20

The password file is damaged

File description: the password file stores the sys password

Simulated failure: clear the file

Echo'> $ORACLE_HOME/dbs/orapworcltest / / orcltest is the instance name of the database

Symptom: using sys to log in through oracle net to report a password error

Sqlplus sys/123456@10.40.16.120:1521/orcltest as sysdba

Fix: regenerate the password file using the built-in tool orapwd

Orapwd file=$ORACLE_HOME/dbs/orapworcltest password=123456 force=y / / force=y if the original password file exists, it is forced to be overwritten

Parameter file corrupted

File description: the parameter file here refers to spfile, which stores the parameters started by the instance and the path of the control file

Simulated failure: clear the file

Echo'> $ORACLE_HOME/dbs/spfileorcltest.ora

Phenomenon: error is reported when modifying database parameters

SQL > alter system set open_cursors=400;alter system set open_cursors=400*ERROR at line 1:ORA-01565: error in identifying file'/u01/app/oracle/product/11.2.0/db_1/dbs/spfileorcltest.ora'ORA-27046: file size is not a multiple of logical block sizeAdditional information: 1

Repair: using rman to restore parameter files

RMAN > list backup of spfile BS Key Type LV Size Device Type Elapsed Time Completion Time--22 Full 9.73M DISK 00:00:02 17-MAR-20BP Key: 22 Status: AVAILABLE Compressed: NO Tag: TAG20200317T133602Piece Name: / home/oracle/backupdir/c-2750922031-20200317-00SPFILE Included: Modification time: 17-MAR-20SPFILE db_unique_name: ORCLTESTRMAN > restore spfile to'/ home/oracle/spfileorcltest.ora' from'/ home/oracle/backupdir/c-2750922031-20200317-00' Mv / home/oracle/spfileorcltest.ora / u01/app/oracle/product/11.2.0/db_1/dbs/ SQL > shutdown immediate SQL > startup

Note that when restoring spfile, if you restore to the original location of spfile, you will report ORA-32011: cannot restore SPFILE to location already being used by the instance

So you need to restore to a new path and move it manually.

PS: parameter files can also be created directly from memory, making it easier (create spfile='/home/oracle/spfileorcltest.ora' from memory;)

Control file corruption

File description: control file records database file information and log information, etc.

View control file

SQL > show parameter control_files NAME TYPE VALUE-control_files String / u01/app/oracle/oradata/orclte st/control01.ctl

Simulated failure: empty the file

Echo'> / u01/app/oracle/oradata/orcltest/control01.ctl

Phenomenon: the normal addition, deletion, modification and query of the front desk will not be affected, but once the log is switched or the checkpoint is generated, the database is down.

SQL > alter system switch logfile;alter system switch logfile*ERROR at line 1:ORA-03113: end-of-file on communication channelProcess ID: 3433Session ID: 1 Serial number: 5

Database alert log

Tue Mar 17 17:39:06 2020Errors in file / u01/app/oracle/diag/rdbms/orcltest/orcltest/trace/orcltest_ckpt_3415.trc:ORA-00202: control file:'/ u01/app/oracle/oradata/orcltest/control01.ctl'ORA-27072: File I error...LGWR O error...LGWR (ospid: 3413): terminating the instance due to error 227Tue Mar 17 17:40:37 2020System state dump requested by (instance=1, osid=3413 (LGWR)) Summary= [abnormal instance termination] .System State dumped to trace file / u01/app/oracle/diag/rdbms/orcltest/orcltest/trace/orcltest_diag_3403_20200317174037.trcDumping diagnostic data in directory= [cdmp_20200317174037], requested by (instance=1, osid=3413 (LGWR)), summary= [abnormal instance termination] .Instance terminated by LGWR, pid = 3413

You can see that the ckpt process first found that the control file was corrupted, and then the instance was killed by the lgwr process. Maybe when you are doing the experiment, you may find that the instance was killed by ckpt, which is also possible. Anyway, what is certain is that the instance will definitely fail in the end.

Repair: using rman to restore control files

Rman target / RMAN > startup nomount RMAN > restore controlfile from'/ home/oracle/backupdir/c-2750922031-20200317-00; RMAN > alter database mount; RMAN > recover database; / / this step actually uses archivedlog + redolog to restore the control file RMAN > alter database open resetlogs

Description:

a. Do not use the way to delete control files to simulate the experiment, this is because the ckpt, lgwr processes have opened the control file, there is a mirror of the control file in memory, and the rm command can not delete the handle to the control file that these processes have opened. So you'll find that the instance didn't hang up.

b. After the resetlogs of the database, the previous backup is invalidated, so you should do a complete database as soon as possible.

c. As you may have noticed, there is only one controlfile in this experiment, and when the controlfile is destroyed, the instance dies. If it is the multiplexing of controlfile, what if one of the controlfile breaks down the database? Let me first come to my conclusion: as long as one of the controlfile is broken, the instance will collapse. At the same time, the alert log will indicate which controlfile is damaged. The solution is to copy a good controlfile to replace the damaged controlfile and restart the library. Let's leave the experiment to everyone. Attached is a section of the alert log of my experiment (ORA-00227: corrupt block detected in control file: (block 1, # blocks 1) ORA-00202: control file:'/ u01)

Summary:

1. The recovery of control files will not lose any transactions, but the database resetlogs will be required, which will invalidate the previous backup slices, so it is best to make a full library backup after restoring the control files.

two。 It is best to set two control files, one can take advantage of the other if it is broken, and the impact on the database and recovery time are minimal.

Data file corruption

In order to continue the experiment, please delete all the previous archive logs and backup files manually and make a complete picture of the current database.

RMAN > backup database; / / full library backup

6.1 ordinary data files are corrupted

Simulated failure: empty the file

Echo'> / u01/app/oracle/oradata/orcltest/tbs01.dbf / / tbs01 is a common tablespace data file

Phenomenon: query the object on the data file to report an error

SQL > select * from scott.t01; / / T01 table on the tbs01.dbf file select * from scott.t01 * ERROR at line 1:ORA-01115: IO error reading block from file (block #) ORA-01110: data file 6:'/ u01/app/oracle/oradata/orcltest/tbs01.dbf'ORA-27072: File I get O errorAdditional information: 4Additional information: 130

Repair: first offline the data file, then restore it using rman restore, and finally online

SQL > alter database datafile 6 offline;RMAN > restore datafile 6 / rman > recover datafile 6; SQL > alter database datafile 6 online

6.2 system tablespace data file is corrupt

Simulated failure: empty the file

Echo'> / u01/app/oracle/oradata/orcltest/system01.dbf

Phenomenon: error reported in query data dictionary

SQL > select * from dba_users;select * from dba_users * ERROR at line 1:ORA-00604: error occurred at recursive SQL level 1ORA-01115: IO error reading block from file (block #) ORA-01110: data file 1:'/ u01/app/oracle/oradata/orcltest/system01.dbf'ORA-27072: File Iram O errorAdditional information: 4Additional information: 95524

Repair: close the library first, then restore it using rman restore, and finally start the library

SQL > shutdown abortSQL > startup mountRMAN > restore datafile 1 / rman > recover datafile 1; SQL > alter database open

6.3 sysaux and undo table space data files are corrupted

The processing method of file corruption in sysaux table space is the same as that of ordinary table space data file, and that of undo table space is the same as that of system table space data file, because the data file of undo table space can not be offline. Limited to the space to omit the experimental steps, only post the phenomenon of file damage.

Sysaux tablespace file corruption: an error was reported by objects accessing sysaux tablespace

SQL > select * from sys.WRI$_OPTSTAT_HISTHEAD_HISTORY; ERROR: ORA-01578: ORACLE data block corrupted (file # 2, block # 986) ORA-01110: data file 2:'/ u01

Undo tablespace file corruption: all modification operations reported errors

SQL > insert into scott.t01 select * from scott.t01 Insert into scott.t01 select * from scott.t01 * ERROR at line 1:ORA-00603: ORACLE server session terminated by fatal errorORA-01578: ORACLE data block corrupted (file # 3, block # 144) ORA-01110: data file 3:'/ u01/app/oracle/oradata/orcltest/undotbs01.dbf'ORA-01578: ORACLE data block corrupted (file # 3, block # 144) ORA-01110: data file 3:'/ u01/app/oracle/oradata/orcltest/undotbs01.dbf'Process ID: 2835Session ID: 20 Serial number: 85

The log file is corrupt

7.1 inactive or active log files are corrupted

View the current log status: log groups currently being written by current-, log groups not archived by active-, and log groups archived by inactive-

SQL > select a.groupmakers, a.member, b.status from v$logfile a, v$log b where a.group#=b.group# order by group# GROUP# MEMBER STATUS-1 / u01/app/oracle/oradata/orcltest/redo01.log INACTIVE 2 / u01/app/oracle/oradata/orcltest/redo02.log CURRENT 3 / u01/app/oracle/oradata/orcltest/redo03.log INACTIVE

Simulated failure: clear the inactive log file

Echo'> / u01/app/oracle/oradata/orcltest/redo03.log

Phenomenon: when the database switches to the log group, the database does not know that there is a problem with the log file on disk, but writes the contents to the log file in the copy of memory. When switching, the log file will find that there is a problem with the log, and then the alert log will report an error, but the normal operation of the database will not be affected, but the log group will be skipped in the future.

SQL > insert into scott.t01 select * from scott.t01; / / insert a table repeatedly to generate a large number of logs

Observe the alert log

Errors in file / u01/app/oracle/diag/rdbms/orcltest/orcltest/trace/orcltest_arc0_9006.trc: ORA-00313: open failed for members of log group 3 of thread 1 ORA-00312: online log 3 thread 1:'/ u01 skgfifi Master archival failure Master archival failure: 313 SQL > alter system switch logfile

Looking at v$log, you can see that group 3 has not been used.

Fix: reinitialize the log file

The log corruption of SQL > alter database clear unarchived logfile group 3; / / active is similarly handled. The database archive will be broken after using this command, so you should make a full database backup immediately after restoring the log group.

7.2 current log files are corrupted

In order to continue the experiment, please delete all the previous archive logs and backup files manually and make a complete picture of the current database.

RMAN > backup database; / / full library backup

View current log status

SQL > select a.groupmakers, a.member, b.status from v$logfile a, v$log b where a.group#=b.group# order by group# GROUP# MEMBER STATUS- 1 / u01/app/oracle/oradata/orcltest/redo01.log INACTIVE2 / u01/app/oracle/oradata/orcltest/redo02.log INACTIVE3 / u01/app/oracle/oradata/orcltest/redo03.log CURRENTSQL > create table scott.t02 as select * from dba_users

Simulated failure: current log file emptied

Echo'> / u01/app/oracle/oradata/orcltest/redo03.log

Phenomenon: the normal addition, deletion, modification and query of the front desk will not be affected, but once the switching log database goes down

SQL > create table scott.t03 as select * from dba_users;SQL > alter system switch logfile;alter system switch logfile*ERROR at line 1:ORA-03113: end-of-file on communication channelProcess ID: 3758Session ID: 1 Serial number: 9

View alert Log

Errors in file / u01/app/oracle/diag/rdbms/orcltest/orcltest/trace/orcltest_lgwr_8969.trc: ORA-00316: log 2 of thread 1, type 0 in header is not log file ORA-00312: online log 2 thread 1:'/ u01online log thread 1:'/ u01U01 LGWR LGWR orcltestLGWR (ospid: 8969): terminating the instance due to error 316 Instance terminated by LGWR, pid = 3458

Restore: open with incomplete recovery

Sqlplus / as sysdba SQL > startup mount SQL > recover database until cancel; / / incomplete recovery

SQL > alter database open resetlogs; / / you will find that it failed to launch the library alter database open resetlogs*ERROR at line 1:ORA-01194: file 1 needs more recovery to be consistentORA-01110: data file 1:'/ u01ActionActionoradataPlacement orcltestSystem01.dbf'

At this time, you need to add implicit parameters and start it again.

SQL > alter system set "_ allow_resetlogs_corruption" = true scope=spfile; SQL > shutdown abortSQL > startup mountSQL > recover database until cancel; / / incomplete recovery input cancelSQL > alter database open resetlogs

Description:

a. Libraries recovered in this way may result in data loss, and there is no guarantee of success.

b. After the restore is successful, you should export all the tables using expdp and rebuild the library.

c. In the above experiment, each log group has only one member. What if each log group has two member?

First of all, let me come to my conclusion: damaging any of the member has no impact on the database, but when you switch to a log group with member damage, you will alert ORA-00313 ORA-00312 ORA-27048 in the alert log. The solution is to delete the member and add it again. There is no need to restart the database. I will not show the experiment. So it is best to set 2 members in each group of logs.

Here I have a question that I can't figure out: when the inactive log is destroyed, when the database is switched to the corrupted log, the database writes normally, but an error is reported during the log switch, which is understandable, because there is a copy of the corrupted log in the system memory, and all writes may be in memory. When switching, the log file must be closed, so an error message is prompted. The current log is destroyed and the database is written normally, but the database crashes directly when the log is switched. I don't understand why there is such a difference between the two.

After reading the above, do you have any further understanding of how to deal with file corruption in Oracle? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.