The problem and Analysis of sudden downtime of Database 07/11 Update SLTechnology News&Howtos

The problem and Analysis of sudden downtime of Database

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Last night, the database of an environment suddenly went down while doing a stress test. This problem is rather urgent. Check the log file right away.

I saw the following paragraph, which reported an os-level linux error. Hint that there is no room.

Fri Mar 14 19:16:47 2014

Archived Log entry 192 added for thread 1 sequence 192 ID 0x1ed7a02c dest 1:

Fri Mar 14 19:39:24 2014

Incremental checkpoint up to RBA [0xc0.2aa5fb.0], current log tail at RBA [0xc1.5d29f.0]

Fri Mar 14 19:46:37 2014

Completed checkpoint up to RBA [0xc1.2.10], SCN: 252702724

Fri Mar 14 20:09:32 2014

Incremental checkpoint up to RBA [0xc1.5d36a.0], current log tail at RBA [0xc1.b8498.0]

Fri Mar 14 20:13:15 2014

KCF: read, write or open error, block=0xa6b82 online=1

File=1'/ TEST1/db05/oradata/PRDTEST1/TEMP_1.dbf'

Error=27061 txt: 'Linux-x86_64 Error: 28: No space left on device

Additional information:-1

Additional information: 8192'

Errors in file / test01/oracle/adm/PRDTEST1/diag/rdbms/prdTEST1/PRDTEST1/trace/PRDTEST1_dbw7_19235.trc:

ORA-63999: data file suffered media failure

ORA-01114: IO error writing block to file 5001 (block # 682882)

ORA-01110: data file 5001:'/ TEST1/db05/oradata/PRDTEST1/TEMP_1.dbf'

ORA-27061: waiting for async I/Os failed

Linux-x86_64 Error: 28: No space left on device

Additional information:-1

Additional information: 8192

DBW7 (ospid: 19235): terminating the instance due to error 63999

Fri Mar 14 20:13:16 2014

System state dump requested by (instance=1, osid=19235 (DBW7)), summary= [abnormal instance termination].

System State dumped to trace file / test01/oracle/adm/PRDTEST1/diag/rdbms/prdTEST1/PRDTEST1/trace/PRDTEST1_diag_19213.trc

Termination issued to instance processes. Waiting for the processes to exit

Fri Mar 14 20:13:31 2014

Instance termination failed to kill one or more processes

Instance terminated by DBW7, pid = 19235

Fri Mar 14 22:55:44 2014

Starting ORACLE instance (normal)

It's really not enough to start looking at the space in the file system right away. Urgently resize the files, get the library up first, and then coordinate the resources of the system.

Although the problem was solved immediately. However, in the case of file writes (error asynchronous io), the database instance will also down. It is indeed a very sensitive thing.

There is a similar error in metalink, but based on the NAS environment, which makes some system changes that cause this error to be somewhat different from this problem. (document ID 1557694.1)

I wonder if there are any measures to control if the data file fails to write, to ensure that the library will not be dropped to down. It was found that an implicit parameter (_ datafile_write_errors_crash_instance) was found after version 11.2.0.2.

The script to view the implied parameters is as follows.

Set linesize 132 column name format a30 column value format a25

Select

X.ksppinm name

Y.ksppstvl value

Y.ksppstdf isdefault

Decode (bitand (y.ksppstvfMagne7), 1memorialMODIFIED playbook page4 ismod

Decode (bitand (y.ksppstvfMagne2), 2 isadj

From sys.x$ksppi x, sys.x$ksppcv y

Where x.inst_id = userenv ('Instance')

And y.inst_id = userenv ('Instance')

And x.indx = y.indx

And x.ksppinm like'% _%'

Order by translate (x.ksppinm,'_','')

By default, the value of this parameter _ datafile_write_errors_crash_instance is true.

The explanation given by oracle is as follows, and a related bug (document ID 7691270.8) has been fixed in 11.2.0.2.

If _ datafile_write_errors_crash_instance = TRUE (default) then

Any write to a datafile which fails due to an IO error causes

An instance crash.

If _ datafile_write_errors_crash_instance = FALSE then the behaviour

Reverts to the previous behaviour (before this fix) such that

A write error to a datafile offlines the file (provided the DB is

In archivelog mode and the file is not in SYSTEM tablespace in

Which case the instance is aborted)

A simple test

You may wonder if the database will down if there is not enough tablespace and data file space. I simply tested it locally to see if the library down would be dropped if there was not enough file space when inserting in parallel. But to simulate errors in data files, you may need to use tools such as bbed to simulate them.

Step 1. First of all, in order to set the implied parameter to the default value true first

Alter system set "_ datafile_write_errors_crash_instance" = TRUE

Step 2. Then a dummy file is created to ensure that there is only a small portion of the space left in the file system.

Dd if=/dev/zero of=/u02/ora11g/hello.txt bs=1000M count=1

-rw-r--r-- 1 ora11g dba 1048576000 Mar 15 04:09 hello.txt

/ dev/sdb2 7.6g 7.2g 45m 100% / u02

There is only a small part of the space left, 45m.

Step 3. Create two table spaces. Let the data files grow automatically.

SQL > create tablespace test_data1 datafile'/ u02 autoextend on ora11g size testdata1.dbf'Uni2m

Tablespace created.

SQL > create tablespace test_data2 datafile'/ u02 autoextend on ora11g size Testdata2.dbf'Uni2m

Tablespace created.

Step 4: create two tables that belong to different tablespaces

SQL > create table test1 tablespace test_data1 as select * from cat

Table created.

SQL > create table test2 tablespace test_data2 as select * from user_objects

Table created.

Step 5: simply check the data in the table. Ensure that the amount of data is within a controllable range.

SQL > select count (*) from test1

COUNT (*)

four

SQL > select count (*) from test2

COUNT (*)

five

Step 6: then write the following three scripts to and from the two tables for continuous insertion and commit, respectively

Script 1 a.sh

Sqlplus test/test

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.