In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
Oracle Study-DataGuard maximum protected Mode failure (ORA-16198)
System environment:
Operating system: RedHat EL5
Oracle: Oracle 11gR2 (11.2.0.1.0)
Failure phenomenon:
Physical Standby experienced the following failure when converting from Maximum Performance to Maximum Protection:
10:13:06 SYS@ prod1 > startup force mount;ORACLE instance started.Total System Global Area 418484224 bytesFixed Size 1336932 bytesVariable Size 281020828 bytesDatabase Buffers 130023424 bytesRedo Buffers 6103040 bytesDatabase mounted.
10:13:30 SYS@ prod1 > select name,protection_mode from v$database
NAME PROTECTION_MODE
--
PROD1 MAXIMUM PROTECTION
Open DataBase failed:
10:07:04 SYS@ prod1 > alter database open
Alter database open
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 4612
Session ID: 1 Serial number: 5
View the alarm log:
Alter database open
Thu Jun 11 10:07:10 2015
LGWR: STARTING ARCH PROCESSES
Thu Jun 11 10:07:10 2015
ARC0 started with pid=19, OS id=4614
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC0: STARTING ARCH PROCESSES
Thu Jun 11 10:07:10 2015
ARC1 started with pid=20, OS id=4616
Thu Jun 11 10:07:10 2015
ARC2 started with pid=21, OS id=4618
ARC1: Archival started
ARC2: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
ARC0: Becoming the'no FAL' ARCH
ARC0: Becoming the'no SRL' ARCH
ARC1: Becoming the heartbeat ARCH
LGWR: Primary database is in MAXIMUM PROTECTION mode
LGWR: Destination LOG_ARCHIVE_DEST_1 is not serviced by LGWR
Thu Jun 11 10:07:11 2015
NSS2 started with pid=18, OS id=4620
Thu Jun 11 10:07:40 2015
ORA-16198: LGWR received timedout error from KSR
Errors in file / u01/app/oracle/diag/rdbms/bjdb/prod1/trace/prod1_lgwr_4565.trc:
ORA-16198: Timeout incurred on internal channel during remote archival
LGWR: Error 16198 verifying archivelog destination LOG_ARCHIVE_DEST_2
Destination LOG_ARCHIVE_DEST_2 is UNSYNCHRONIZED
LGWR: Continuing...
LGWR: Minimum of 1 applicable standby database required
Errors in file / u01/app/oracle/diag/rdbms/bjdb/prod1/trace/prod1_lgwr_4565.trc:
ORA-16072: a minimum of one standby database destination is required
Errors in file / u01/app/oracle/diag/rdbms/bjdb/prod1/trace/prod1_lgwr_4565.trc:
ORA-16072: a minimum of one standby database destination is required
LGWR (ospid: 4565): terminating the instance due to error 16072
Instance terminated by LGWR, pid = 4565
-as can be seen from the log information, all archiving paths failed, local archiving and remote archiving failed!
Solution:
According to the official suggestion of Oracle, modify the net_ timeout value: (master / slave database)
10:10:23 SYS@ prod1 > alter system set log_archive_dest_2='service=shdb lgwr sync affirm VALID_FOR= (online_logfiles,primary_role) net_timeout=30 DB_UNIQUE_NAME=shdb';10:11:25 SYS@ shdb > alter system set log_archive_dest_2='service=bjdb lgwr sync affirm VALID_FOR= (online_logfiles,primary_role) net_timeout=30 DB_UNIQUE_NAME=bjdb'
Add standby redo log:
Main library: (add Standby redo logs under Maximum Performance)
10:20:35 SYS@ prod1 > select group#,status, bytes from vault log; GROUP# STATUS BYTES- 1 INACTIVE 52428800 2 CURRENT 52428800 3 INACTIVE 5242880010 bytes from 54 SYS@ prod1 > select member from v$logfile MEMBER- -/ u01/app/oracle/oradata/prod1/redo03.log/u01/app/oracle/oradata/prod1/redo02.log/u01/app/oracle/oradata/prod1/redo01.log/u01/app/oracle/oradata/prod1/std_redo01.log/u01/app/oracle/oradata/prod1/std_redo02.log6 rows selected.10:21:03 SYS@ prod1 > alter database add standby logfile 10:21:25 2'/ u01 size size 50m Database altered.10:21:46 SYS@ prod1 > alter database add standby logfile10:21:51 2'/ u01 apprenticeship oradata size prod1Universe redo04.log' database altered.10:01:48 SYS@ shdb > select member from v$logfile MEMBER- -/ u01/app/oracle/oradata/shdb/redo03.log/u01/app/oracle/oradata/shdb/redo02.log/u01/app/oracle/oradata/shdb/redo01.log/disk2/arch_prod11_0_881851982.dbf/u01/app/oracle/oradata/shdb/std_redo01.log/u01/app/oracle/oradata/shdb/std_redo02.log6 rows selected.
Prepare the library:
10:18:17 SYS@ shdb > alter database open;Database altered.10:20:21 SYS@ shdb > select group#,status, bytes from v$log GROUP# STATUS BYTES- 1 CLEARING 52428800 2 CLEARING 52428800 3 CLEARING_CURRENT 52428800 10:20:45 SYS@ shdb > alter database add standby logfile 10:22:41 2'/ u01 size Database altered.10:22:57 SYS@ shdb > alter database add standby logfile10:23:02 2'/ u01 Greater col member for SYS@ shdb SYS@ shdb > select group#,member from v$logfile > SYS@ shdb 50m Database altered.10:23:14 SYS@ shdb > SYS@ shdb GROUP# MEMBER- 3 / u01/app/oracle/oradata/shdb/redo03.log 2 / u01/app/oracle/oradata/shdb/redo02.log 1 / u01/app/oracle/oradata/shdb/ Redo01.log 5 / u01/app/oracle/oradata/shdb/std_redo01.log 6 / u01/app/oracle/oradata/shdb/std_redo02.log 7 / u01/app/oracle/oradata/shdb/std_redo03.log 8 / u01/app/oracle/oradata/shdb/std_redo04.log8 rows selected.
-after being dealt with in the above ways, the problem remains the same. In Maximum Protection mode, the main library still cannot be Open, but under Maximum Availablity and Maximum Performance, the main library can be Open. The cause of the error is still being explored.
Reference documentation:
Analysis of the solution to the failure of Database reporting ORA-16198
-http://blog.itpub.net/28546804/viewspace-1260003/
1. First of all, let's take a look at the official documentation on the error report of ORA-16198.
...
The reason for the error may be that the net_timeout setting is low. In previous versions, the default was 10, but it is recommended to change it to 30.
...
The net_timeout attribute in the log_archive_dest_2 on the primary is
Set too low so that
LNS couldn't finish sending redo block in 10 seconds in this example.
... .
If setting 30 is not enough, please check the IO usage or network transfer of the disk
... ..
Note: If NET_TIMEOUT attribute has already been set to 30, and you still get ORA-16198, that means LNS couldn't finish sending redo block in 30 seconds.
The slowness may caused by:
1. Operating System. Please keep track of OS usage (like iostat).
2. Network. Please keep track network flow (like tcpdump).
...
It is also possible to be BUG. The affected version is 11.2.0.1 or 10.2.0.4. It is recommended to upgrade to a version above 11.2.0.2.
... ..
Bug 9259587 Multiple LGWR reconnect attempts in Data Guard MAXIMUM_AVAILABILITY
This note gives a brief overview bug 9259587.
Affects:
Product (Component) Oracle Server (Rdbms)
Range of versions believed to be affected Versions BELOW 12.1
Versions confirmed as being affected 11.2.0.1 10.2.0.4
Platforms affected Generic (all / most platforms affected)
Fixed:
This issue is fixed in 12.1 (Future Release) 11.2.0.2 (Server Patch Set)
Symptoms:
Related To:
Hang (Process Spins)
Active Dataguard (ADG)
Physical Standby Database / Dataguard
Description
...
The error that occurred is probably similar to the following
...
Rediscovery Notes:
Alert log contains messages like:
ORA-16198: LGWR received timedout error from KSR
LGWR: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (16198)
LGWR: Destination LOG_ARCHIVE_DEST_2 network reconnect abandoned
Errors in file
/ app/oracle/diag/rdbms/ora11g_dga/ora11g/trace/ora11g_lgwr_290838.trc:
ORA-16198: Timeout incurred on internal channel during remote archival
LGWR: Network asynch wait error O wait error 16198 log 2 service 'ora11g_DGb'
LGWR: Error 16198 disconnecting from destination LOG_ARCHIVE_DEST_2 standby
Host 'ora11g_DGb'
Destination LOG_ARCHIVE_DEST_2 is UNSYNCHRONIZED
LGWR: Failed to archive log 2 thread 1 sequence 1422 (16198)
...
In a Data Guard configuration using LGWR SYNC transport on one or more LOG_ARCHIVE_DEST_n parameters, and using a protection mode of MAXIMUM_AVAILABILITY, then if the primary database becomes disconnected from the standby database, LGWR continues to attempt to reconnect to the standby database. It should instead avoid attempts to reconnect until an ARCH process has re-established communication with the standby database.
So what's certain is:
This error mainly occurs in the architecture of DATAGUARD. The reason is that the log of the host is not completed at the specified time when it is transferred to the slave, or the log cannot be sent to the slave. Then we will explain the following two main causes of failure:
two。 Failure caused by too low parameter setting
It may be because the NET_ timeout value of the set LOG_ARCHIVE_DEST_2 is too low, so that the log cannot be transferred at the specified time. It is recommended to set it to 30.
Query NET_TIMEOUT:
SQL > select DEST_NAME,NET_TIMEOUT FROM V$ARCHIVE_DEST
DEST_NAME NET_TIMEOUT
--
LOG_ARCHIVE_DEST_1 0
LOG_ARCHIVE_DEST_2 30
... Output ellipsis
View the LOG_ARCHIVE_DEST_2 parameters:
SQL > show parameter log_archive_dest_2
The value is' service=orcl_std reopen=120 lgwr sync valid_for= (online_logfiles,primary_role) db_unique_name=orcl_std'
I didn't set the NET_TIMEOUT parameter, but the default is 30, because my version is 11.2.0.3.
If your parameter is not 30, please modify it as follows:
SQL > ALTER SYSTEM SET LOG_ARCHIVE_DEST_2='service=orcl_std reopen=120 lgwr sync net_timeout=30 valid_for= (online_logfiles,primary_role) db_unique_name=orcl_std'
Then watch to see if such problems are still reported.
3. Failures caused by other reasons, such as blocked network or busy storage IO.
If the error is caused by blocked network and busy storage, please use operating system commands such as tcpdump or IOSTAT,VMSTAT to view the usage of relevant resources, or contact the network storage administrator to assist in the analysis.
If all of the above is no problem, there is another possibility that your host or slave has changed the sys password separately, but the relevant slave or host has not been changed at the same time, so it is very possible for the host to fail when verifying to the slave.
4. BUG of the database
If the above methods have not solved the problem, you can not analyze the specific reasons, and your database version happens to be 11.2.0.1 or 10.2.0.4, then upgrade it.
5. Summary
To consider this kind of problem, we should analyze it from many angles, such as: low parameter value, storage usage, network transmission, sys password changed, database BUG and so on.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.