In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
Today, I would like to talk to you about how to deal with file corruption in Oracle. Many people may not know much about it. In order to make you understand better, the editor has summarized the following for you. I hope you can get something according to this article.
Overview
This article will introduce you to the phenomenon of oracle file corruption and countermeasures. Please note that all recoveries are based on backup, so please turn on the daily backup of the database.
The article will be expanded from the following files:
a. Password file
b. Parameter file
c. Control file
d. Data files (general tablespace data files, other tablespace data files such as system, sysaux, undo)
e. Log files (current, active, inactive)
Before the formal experiment, I would like to ask a question: which of the above documents is the deadliest? You are welcome to leave a message at the end of the article.
Environmental preparation
This experiment is carried out in oracle 11G archive mode. Before the experiment, make a full backup of the database.
Create a regular tablespace and some test tables create tablespace tbs01 datafile'/ u01 image RMAN from dba_objects where rownum backup database; / / full library backup RMAN > list backup / / View backup BS Key Type LV Size Device Type Elapsed Time Completion Time--21 Full 1.14G DISK 00:01:33 17-MAR-20 BP Key : 21 Status: AVAILABLE Compressed: NO Tag: TAG20200317T133425 Piece Name: / home/oracle/backupdir/ORCLTEST_2750922031_133_1_20200317_1035293665.bkp List of Datafiles in backup set 21File LV Type Ckp SCN Ckp Time Name---- 1 Full 1606913 17-MAR-20 / u01/app/oracle/oradata/orcltest/system01.dbf2 Full 1606913 17-MAR-20 / u01/app/oracle/oradata/orcltest/sysaux01.dbf3 Full 1606913 17-MAR-20 / u01/app/oracle/oradata/orcltest/undotbs01.dbf4 Full 1606913 17-MAR-20 / u01/app/oracle/oradata/orcltest/users01.dbf5 Full 1606913 17-MAR-20 / u01/app/oracle/oradata/orcltest/example01.dbf6 Full 1606913 17-MAR-20 / u01/app/oracle/oradata/orcltest/ Tbs01.dbfBS Key Type LV Size Device Type Elapsed Time Completion Time--22 Full 9.73M DISK 00:00:02 17-MAR-20 BP Key: 22 Status: AVAILABLE Compressed: NO Tag: TAG20200317T133602 Piece Name: / home/oracle/backupdir/c-2750922031-20200317-00SPFILE Included: Modification time: 17-MAR-20SPFILE db_unique_name: ORCLTEST Control File Included: Ckp SCN: 1606985 Ckp time: 17-MAR-20
The password file is damaged
File description: the password file stores the sys password
Simulated failure: clear the file
Echo'> $ORACLE_HOME/dbs/orapworcltest / / orcltest is the instance name of the database
Symptom: using sys to log in through oracle net to report a password error
Sqlplus sys/123456@10.40.16.120:1521/orcltest as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Tue Mar 17 13:57:52 2020Copyright (c) 1982, 2013, Oracle. All rights reserved.ERROR:ORA-01017: invalid username/password; logon deniedEnter user-name:
Fix: regenerate the password file using the built-in tool orapwd
Orapwd file=$ORACLE_HOME/dbs/orapworcltest password=123456 force=y / / force=y if the original password file exists, it is forced to be overwritten
Parameter file corrupted
File description: the parameter file here refers to spfile, which stores the parameters started by the instance and the path of the control file
Simulated failure: clear the file
Echo'> $ORACLE_HOME/dbs/spfileorcltest.ora
Phenomenon: error is reported when modifying database parameters
SQL > alter system set open_cursors=400;alter system set open_cursors=400*ERROR at line 1:ORA-01565: error in identifying file'/u01/app/oracle/product/11.2.0/db_1/dbs/spfileorcltest.ora'ORA-27046: file size is not a multiple of logical block sizeAdditional information: 1
Repair: using rman to restore parameter files
RMAN > list backup of spfile BS Key Type LV Size Device Type Elapsed Time Completion Time--22 Full 9.73M DISK 00:00:02 17-MAR-20BP Key: 22 Status: AVAILABLE Compressed: NO Tag: TAG20200317T133602Piece Name: / home/oracle/backupdir/c-2750922031-20200317-00SPFILE Included: Modification time: 17-MAR-20SPFILE db_unique_name: ORCLTESTRMAN > restore spfile to'/ home/oracle/spfileorcltest.ora' from'/ home/oracle/backupdir/c-2750922031-20200317-00' Mv / home/oracle/spfileorcltest.ora / u01/app/oracle/product/11.2.0/db_1/dbs/ SQL > shutdown immediate SQL > startup
Note that when restoring spfile, if you restore to the original location of spfile, you will report ORA-32011: cannot restore SPFILE to location already being used by the instance
So you need to restore to a new path and move it manually.
PS: parameter files can also be created directly from memory, making it easier (create spfile='/home/oracle/spfileorcltest.ora' from memory;)
Control file corruption
File description: control file records database file information and log information, etc.
View control file
SQL > show parameter control_files NAME TYPE VALUE-control_files String / u01/app/oracle/oradata/orclte st/control01.ctl
Simulated failure: empty the file
Echo'> / u01/app/oracle/oradata/orcltest/control01.ctl
Phenomenon: the normal addition, deletion, modification and query of the front desk will not be affected, but once the log is switched or the checkpoint is generated, the database is down.
SQL > alter system switch logfile;alter system switch logfile*ERROR at line 1:ORA-03113: end-of-file on communication channelProcess ID: 3433Session ID: 1 Serial number: 5
Database alert log
Tue Mar 17 17:39:06 2020Errors in file / u01/app/oracle/diag/rdbms/orcltest/orcltest/trace/orcltest_ckpt_3415.trc:ORA-00202: control file:'/ u01/app/oracle/oradata/orcltest/control01.ctl'ORA-27072: File I error...LGWR O error...LGWR (ospid: 3413): terminating the instance due to error 227Tue Mar 17 17:40:37 2020System state dump requested by (instance=1, osid=3413 (LGWR)) Summary= [abnormal instance termination] .System State dumped to trace file / u01/app/oracle/diag/rdbms/orcltest/orcltest/trace/orcltest_diag_3403_20200317174037.trcDumping diagnostic data in directory= [cdmp_20200317174037], requested by (instance=1, osid=3413 (LGWR)), summary= [abnormal instance termination] .Instance terminated by LGWR, pid = 3413
You can see that the ckpt process first found that the control file was corrupted, and then the instance was killed by the lgwr process. Maybe when you are doing the experiment, you may find that the instance was killed by ckpt, which is also possible. Anyway, what is certain is that the instance will definitely fail in the end.
Repair: using rman to restore control files
Rman target / RMAN > startup nomount RMAN > restore controlfile from'/ home/oracle/backupdir/c-2750922031-20200317-00; RMAN > alter database mount; RMAN > recover database; / / this step actually uses archivedlog + redolog to restore the control file RMAN > alter database open resetlogs
Description:
a. Do not use the way to delete control files to simulate the experiment, this is because the ckpt, lgwr processes have opened the control file, there is a mirror of the control file in memory, and the rm command can not delete the handle to the control file that these processes have opened. So you'll find that the instance didn't hang up.
b. After the resetlogs of the database, the previous backup is invalidated, so you should do a complete database as soon as possible.
c. As you may have noticed, there is only one controlfile in this experiment, and when the controlfile is destroyed, the instance dies. If it is the multiplexing of controlfile, what if one of the controlfile breaks down the database? Let me first come to my conclusion: as long as one of the controlfile is broken, the instance will collapse. At the same time, the alert log will indicate which controlfile is damaged. The solution is to copy a good controlfile to replace the damaged controlfile and restart the library. Let's leave the experiment to everyone. Attached is a section of the alert log of my experiment (ORA-00227: corrupt block detected in control file: (block 1, # blocks 1) ORA-00202: control file:'/ u01)
Summary:
1. The recovery of control files will not lose any transactions, but the database resetlogs will be required, which will invalidate the previous backup slices, so it is best to make a full library backup after restoring the control files.
two。 It is best to set two control files, one can take advantage of the other if it is broken, and the impact on the database and recovery time are minimal.
Data file corruption
In order to continue the experiment, please delete all the previous archive logs and backup files manually and make a complete picture of the current database.
RMAN > backup database; / / full library backup
6.1 ordinary data files are corrupted
Simulated failure: empty the file
Echo'> / u01/app/oracle/oradata/orcltest/tbs01.dbf / / tbs01 is a common tablespace data file
Phenomenon: query the object on the data file to report an error
SQL > select * from scott.t01; / / T01 table on the tbs01.dbf file select * from scott.t01 * ERROR at line 1:ORA-01115: IO error reading block from file (block #) ORA-01110: data file 6:'/ u01/app/oracle/oradata/orcltest/tbs01.dbf'ORA-27072: File I get O errorAdditional information: 4Additional information: 130
Repair: first offline the data file, then restore it using rman restore, and finally online
SQL > alter database datafile 6 offline;RMAN > restore datafile 6 / rman > recover datafile 6; SQL > alter database datafile 6 online
6.2 system tablespace data file is corrupt
Simulated failure: empty the file
Echo'> / u01/app/oracle/oradata/orcltest/system01.dbf
Phenomenon: error reported in query data dictionary
SQL > select * from dba_users;select * from dba_users * ERROR at line 1:ORA-00604: error occurred at recursive SQL level 1ORA-01115: IO error reading block from file (block #) ORA-01110: data file 1:'/ u01/app/oracle/oradata/orcltest/system01.dbf'ORA-27072: File Iram O errorAdditional information: 4Additional information: 95524
Repair: close the library first, then restore it using rman restore, and finally start the library
SQL > shutdown abortSQL > startup mountRMAN > restore datafile 1 / rman > recover datafile 1; SQL > alter database open
6.3 sysaux and undo table space data files are corrupted
The processing method of file corruption in sysaux table space is the same as that of ordinary table space data file, and that of undo table space is the same as that of system table space data file, because the data file of undo table space can not be offline. Limited to the space to omit the experimental steps, only post the phenomenon of file damage.
Sysaux tablespace file corruption: an error was reported by objects accessing sysaux tablespace
SQL > select * from sys.WRI$_OPTSTAT_HISTHEAD_HISTORY; ERROR: ORA-01578: ORACLE data block corrupted (file # 2, block # 986) ORA-01110: data file 2:'/ u01
Undo tablespace file corruption: all modification operations reported errors
SQL > insert into scott.t01 select * from scott.t01 Insert into scott.t01 select * from scott.t01 * ERROR at line 1:ORA-00603: ORACLE server session terminated by fatal errorORA-01578: ORACLE data block corrupted (file # 3, block # 144) ORA-01110: data file 3:'/ u01/app/oracle/oradata/orcltest/undotbs01.dbf'ORA-01578: ORACLE data block corrupted (file # 3, block # 144) ORA-01110: data file 3:'/ u01/app/oracle/oradata/orcltest/undotbs01.dbf'Process ID: 2835Session ID: 20 Serial number: 85
The log file is corrupt
7.1 inactive or active log files are corrupted
View the current log status: log groups currently being written by current-, log groups not archived by active-, and log groups archived by inactive-
SQL > select a.groupmakers, a.member, b.status from v$logfile a, v$log b where a.group#=b.group# order by group# GROUP# MEMBER STATUS-1 / u01/app/oracle/oradata/orcltest/redo01.log INACTIVE 2 / u01/app/oracle/oradata/orcltest/redo02.log CURRENT 3 / u01/app/oracle/oradata/orcltest/redo03.log INACTIVE
Simulated failure: clear the inactive log file
Echo'> / u01/app/oracle/oradata/orcltest/redo03.log
Phenomenon: when the database switches to the log group, the database does not know that there is a problem with the log file on disk, but writes the contents to the log file in the copy of memory. When switching, the log file will find that there is a problem with the log, and then the alert log will report an error, but the normal operation of the database will not be affected, but the log group will be skipped in the future.
SQL > insert into scott.t01 select * from scott.t01; / / insert a table repeatedly to generate a large number of logs
Observe the alert log
Errors in file / u01/app/oracle/diag/rdbms/orcltest/orcltest/trace/orcltest_arc0_9006.trc: ORA-00313: open failed for members of log group 3 of thread 1 ORA-00312: online log 3 thread 1:'/ u01 skgfifi Master archival failure Master archival failure: 313 SQL > alter system switch logfile
Looking at v$log, you can see that group 3 has not been used.
Fix: reinitialize the log file
The log corruption of SQL > alter database clear unarchived logfile group 3; / / active is similarly handled. The database archive will be broken after using this command, so you should make a full database backup immediately after restoring the log group.
7.2 current log files are corrupted
In order to continue the experiment, please delete all the previous archive logs and backup files manually and make a complete picture of the current database.
RMAN > backup database; / / full library backup
View current log status
SQL > select a.groupmakers, a.member, b.status from v$logfile a, v$log b where a.group#=b.group# order by group# GROUP# MEMBER STATUS- 1 / u01/app/oracle/oradata/orcltest/redo01.log INACTIVE2 / u01/app/oracle/oradata/orcltest/redo02.log INACTIVE3 / u01/app/oracle/oradata/orcltest/redo03.log CURRENTSQL > create table scott.t02 as select * from dba_users
Simulated failure: current log file emptied
Echo'> / u01/app/oracle/oradata/orcltest/redo03.log
Phenomenon: the normal addition, deletion, modification and query of the front desk will not be affected, but once the switching log database goes down
SQL > create table scott.t03 as select * from dba_users;SQL > alter system switch logfile;alter system switch logfile*ERROR at line 1:ORA-03113: end-of-file on communication channelProcess ID: 3758Session ID: 1 Serial number: 9
View alert Log
Errors in file / u01/app/oracle/diag/rdbms/orcltest/orcltest/trace/orcltest_lgwr_8969.trc: ORA-00316: log 2 of thread 1, type 0 in header is not log file ORA-00312: online log 2 thread 1:'/ u01online log thread 1:'/ u01U01 LGWR LGWR orcltestLGWR (ospid: 8969): terminating the instance due to error 316 Instance terminated by LGWR, pid = 3458
Restore: open with incomplete recovery
Sqlplus / as sysdba SQL > startup mount SQL > recover database until cancel; / / incomplete recovery
SQL > alter database open resetlogs; / / you will find that it failed to launch the library alter database open resetlogs*ERROR at line 1:ORA-01194: file 1 needs more recovery to be consistentORA-01110: data file 1:'/ u01ActionActionoradataPlacement orcltestSystem01.dbf'
At this time, you need to add implicit parameters and start it again.
SQL > alter system set "_ allow_resetlogs_corruption" = true scope=spfile; SQL > shutdown abortSQL > startup mountSQL > recover database until cancel; / / incomplete recovery input cancelSQL > alter database open resetlogs
Description:
a. Libraries recovered in this way may result in data loss, and there is no guarantee of success.
b. After the restore is successful, you should export all the tables using expdp and rebuild the library.
c. In the above experiment, each log group has only one member. What if each log group has two member?
First of all, let me come to my conclusion: damaging any of the member has no impact on the database, but when you switch to a log group with member damage, you will alert ORA-00313 ORA-00312 ORA-27048 in the alert log. The solution is to delete the member and add it again. There is no need to restart the database. I will not show the experiment. So it is best to set 2 members in each group of logs.
Here I have a question that I can't figure out: when the inactive log is destroyed, when the database is switched to the corrupted log, the database writes normally, but an error is reported during the log switch, which is understandable, because there is a copy of the corrupted log in the system memory, and all writes may be in memory. When switching, the log file must be closed, so an error message is prompted. The current log is destroyed and the database is written normally, but the database crashes directly when the log is switched. I don't understand why there is such a difference between the two.
After reading the above, do you have any further understanding of how to deal with file corruption in Oracle? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 214
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.