In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
This article introduces how to do a lot of GES information on a node in the RAC environment. The content is very detailed. Interested friends can use it for reference. I hope it will be helpful to you.
For the RAC environment, it is not surprising that resource preemption occurs between two nodes, but if the type of information occurs regularly and frequently, it means something wrong.
This article continues to solve the problem.
A large amount of GES information appears on one node in RAC environment.
After the session that locked the resource was deleted, the problem was not completely resolved. Soon the M000 process was restarted in the background and came to a dead end again:
SQL > SELECT SID, TYPE, ID1, ID2, LMODE, REQUEST, CTIME, BLOCK
2 FROM V$TRANSACTION_ENQUEUE
SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK
-
245 TX 1048582 120892 6 0 3 2
265 TX 786453 122545 6 0 332 2
520 TX 917569 123789 6 0 2568 2
SQL > SELECT SID, SERIAL#, STATUS, MODULE, ACTION
2 FROM V$SESSION
3 WHERE SID IN (265,520)
SID SERIAL# STATUS MODULE ACTION
-
265 24021 ACTIVE MMON_SLAVE Auto-Flush Slave Action
520 20179 ACTIVE MMON_SLAVE Auto-DBFUS Action
SQL > SELECT SID, EVENT, P1TEXT, P1, P2TEXT, P2, P3TEXT, P3, SECONDS_IN_WAIT
2 FROM V$SESSION_WAIT
3 WHERE SID IN (265,520)
SID EVENT P1TEXT P1 P2TEXT P2 P3TEXT P3 SECONDS_IN_WAIT
265 gc current request file# 4 block# 31685 id# 33619976 447
520 db file sequential read file# 4 block# 2486 blocks 1 2680
Obviously, the problem has not really been solved. It seems that the problem is related to the CLUSTER environment. Check the log information of the CLUSTER system:
Bash-3.00$ cd $ORACLE_HOME/../crs/log/newtrade2
Bash-3.00$ cd cssd/
Bash-3.00$ tail-500 ocssd.log
[CSSD] 2009-12-19 1919 clssgmClientConnectMsg: clssgmClientConnectMsg: Connect from con (100b6d400) proc (100b70510) pid () proto
[CSSD] 2009-12-19 19 19 Connect from con Connect from con 47.155 [11] > TRACE: clssgmClientConnectMsg: Connect from con (100b51f00) proc (100b6ad60) pid () proto
[CSSD] 2009-12-19 19 19 clssgmClientConnectMsg 47.237 [11] > TRACE: clssgmClientConnectMsg: Connect from con (100b3a100) proc (100babc70) pid () proto
[CSSD] 2009-12-19 19 Connect from con Connect from con 47.477 [11] > TRACE: clssgmClientConnectMsg: Connect from con (100ba2f80) proc (100bac710) pid () proto
[CSSD] 2009-12-19 19 Connect from con Connect from con 47.568 [11] > TRACE: Connect from con (100b3a360) proc (100b750f0) pid () proto
[CSSD] 2009-12-19 19 Connect from con Connect from con 47.568 [11] > TRACE: Connect from con (100b6e3c0) proc (100bb37c0) pid () proto
[CSSD] 2009-12-19 19 Connect from con Connect from con 47.568 [11] > TRACE: Connect from con (100ba9990) proc (100ba7590) pid () proto
[CSSD] 2009-12-19 19 19 Connect from con Connect from con 47.584 [11] > TRACE: Connect from con (100bb0450) proc (100bb0ed0) pid () proto
[CSSD] 2009-12-19 1923 clssnmPollingThread: clssnmPollingThread: node newtrade1 (1) missed (2) checkin (s)
[CSSD] 2009-12-19 1919 27 CSSD 16.680 [15] > TRACE: clssnmPollingThread: node newtrade2 (2) missed (2) checkin (s)
[CSSD] 2009-12-19 1949 clssnmPollingThread clssnmPollingThread: node newtrade1 (1) missed (2) checkin (s)
[CSSD] 2009-12-19 1919 TRACE: clssnmPollingThread: node newtrade2 (2) missed (2) checkin (s)
.
.
.
[CSSD] 2009-12-23 15 clssnmPollingThread 22 missed 01.500 [15] > clssnmPollingThread: node newtrade2 (2) missed (2) checkin (s)
[CSSD] 2009-12-23 15 15 clssnmPollingThread 22 clssnmPollingThread 29.780 [15] > node newtrade2: node newtrade2 (2) missed (2) checkin (s)
When the lock information was queried in the previous article, the transaction for the locked session began on the 19th, and you can see from the log here that the output information has changed since 19:23 on the 19th. This shows that the problem was introduced at this point in time.
Continue to monitor evmd logs:
Bash-3.00$ cd.. / evmd/
Bash-3.00$ tail-100 evmdOUT.log
2009-12-19 19:16:27 Read failed for subscriber object 101c01b40
2009-12-19 19:16:30 Read failed for subscriber object 101c01b40
2009-12-19 19:17:00 Read failed for subscriber object 101c01b40
2009-12-19 19:17:13 Read failed for subscriber object 101c01b40
2009-12-19 19:17:33 Read failed for subscriber object 101c01b40
.
.
.
2009-12-19 19:19:43 Read failed for subscriber object 101c01b40
2009-12-19 19:19:44 Read failed for subscriber object 101c01b40
2009-12-19 19:19:45 Read failed for subscriber object 101c01b40
2009-12-19 19:19:46 Read failed for subscriber object 101c01b40
2009-12-19 19:19:46 Read failed for subscriber object 101c01b40
2009-12-19 19:28:47 Read failed for subscriber object 101c01b40
There was no new error message this time, but the error message that had been entered disappeared after the same point in time, obviously due to a problem with the CLUSTER environment.
I intend to monitor the status of each component of the CLUSTER, but it is found that there is no response after the current node executes crs_stat, but it is aborted by me after the waiting time exceeds 2 hours.
There is no problem running on another node, and it shows that the problem node is normal:
Bash-3.00 $. / crs_stat-t
Name type target status host
Ora....rade.db application ONLINE ONLINE newtrade2
Ora....e1.inst application ONLINE ONLINE newtrade1
Ora....e2.inst application ONLINE ONLINE newtrade2
Ora....E1.lsnr application ONLINE OFFLINE
Ora....de1.gsd application ONLINE ONLINE newtrade1
Ora....de1.ons application ONLINE ONLINE newtrade1
Ora....de1.vip application ONLINE ONLINE newtrade1
Ora....E2.lsnr application ONLINE ONLINE newtrade2
Ora....de2.gsd application ONLINE ONLINE newtrade2
Ora....de2.ons application ONLINE ONLINE newtrade2
Ora....de2.vip application ONLINE ONLINE newtrade2
Check node 2 and find that there are multiple RACGMAIN processes:
Bash-3.00$ ps-ef | grep / data
Root 10854 10 Sep 30? 670:09 / data/oracle/product/10.2/crs/bin/crsd.bin reboot
Oracle 10839 10 Sep 30? 0:00 sh-c sh-c 'ulimit-c unlimited; cd / data/oracle/product/10.2/crs/log/newtrade
Oracle 14657 14326 0 Sep 30? 2:45 / data/oracle/product/10.2/crs/bin/evmlogger.bin-o / data/oracle/product/10.2/cr
Oracle 14417 14266 0 Sep 30? 0:00 sh-c / bin/sh-c 'ulimit-c unlimited; cd / data/oracle/product/10.2/crs/log/new
Oracle 14418 14417 0 Sep 30? 0:00 / bin/sh-c ulimit-c unlimited; cd / data/oracle/product/10.2/crs/log/newtrade2/
Oracle 14419 14418 0 Sep 30? 1233:07 / data/oracle/product/10.2/crs/bin/ocssd.bin
Oracle 14326 10839 0 Sep 30? 103:58 / data/oracle/product/10.2/crs/bin/evmd.bin
Root 14316 14265 0 Sep 30? 32:00 / data/oracle/product/10.2/crs/bin/oprocd run-t 1000-m 500-f
Oracle 15625 10 Oct 06? 0:00 / data/oracle/product/10.2/crs/opmn/bin/ons-d
Oracle 15627 15625 0 Oct 06? 20:23 / data/oracle/product/10.2/crs/opmn/bin/ons-d
Oracle 16028 10 Sep 30? 378:38 / data/oracle/product/10.2/database/bin/racgimon startd newtrade
Root 17341 10 Dec 19? 0:00 / data/oracle/product/10.2/crs/bin/racgmain check
Oracle 17311 10 Dec 19? 0:00 / data/oracle/product/10.2/crs/bin/racgmain check
Oracle 17331 10 Dec 19? 0:00 / data/oracle/product/10.2/crs/bin/racgmain check
Oracle 17335 10 Dec 19? 0:00 / data/oracle/product/10.2/crs/bin/racgmain check
Oracle 21094 10 Jul 27? 11:42 / data/oracle/product/10.2/database/bin/tnslsnr LISTENER-inherit
Oracle 17359 10 Dec 19? 0:00 / data/oracle/product/10.2/crs/bin/racgmain check
Oracle 9866 24535 0 21:51:47 pts/1 0:00 grep / data
Oracle 17267 10 Dec 19? 0:00 / data/oracle/product/10.2/database/bin/racgmain check
Oracle 17353 10 Dec 19? 0:00 / data/oracle/product/10.2/crs/bin/racgmain check
On node 1, such a process does not exist:
Bash-3.00$ ps-ef | grep / data
Oracle 2150 10 Aug 11? 0:00 sh-c sh-c 'ulimit-c unlimited; cd / data/oracle/product/10.2/crs/log/newtrade
Oracle 6277 10 Aug 12? 8:05 / data/oracle/product/10.2/database/bin/tnslsnr LISTENER-inherit
Oracle 4294 4293 0 Aug 11? 0:00 / bin/sh-c ulimit-c unlimited; cd / data/oracle/product/10.2/crs/log/newtrade1/
Root 2153 10 Aug 11? 203:08 / data/oracle/product/10.2/crs/bin/crsd.bin reboot
Oracle 4293 4150 0 Aug 11? 0:00 sh-c / bin/sh-c 'ulimit-c unlimited; cd / data/oracle/product/10.2/crs/log/new
Oracle 4887 4885 0 Aug 11? 5:16 / data/oracle/product/10.2/crs/opmn/bin/ons-d
Oracle 4175 2150 0 Aug 11? 29:00 / data/oracle/product/10.2/crs/bin/evmd.bin
Oracle 4529 4175 0 Aug 11? 0:46 / data/oracle/product/10.2/crs/bin/evmlogger.bin-o / data/oracle/product/10.2/cr
Oracle 4295 4294 0 Aug 11? 284:08 / data/oracle/product/10.2/crs/bin/ocssd.bin
Oracle 9739 10 Aug 11? 110:18 / data/oracle/product/10.2/database/bin/racgimon startd newtrade
Oracle 4885 10 Aug 11? 0:00 / data/oracle/product/10.2/crs/opmn/bin/ons-d
Root 4228 4149 0 Aug 11? 7:30 / data/oracle/product/10.2/crs/bin/oprocd run-t 1000-m 500-f
Oracle 23064 15394 0 21:49:20 pts/1 0:00 grep / data
Moreover, the start-up time of these processes is exactly the time when the problem occurred, which is very suspicious.
Clearing these processes has no impact on the operation of the RAC database, try to kill these processes:
Bash-3.00$ kill-9 17311 17331 17335 17359 17267 17353
Switch to the racgmain process started by the root antivirus root user.
Root@newtrade2 # kill-9 17341
Unfortunately, after killing these processes, the problem has not been completely solved.
It seems that the only way is to completely restart the CLUSTER environment of the problem node.
Bash-3.00$ sqlplus / as sysdba
SQL*Plus: Release 10.2.0.3.0-Production on Wednesday December 23 22:24:34 2009
Copyright (c) 1982, 2006, Oracle. All Rights Reserved.
Connected.
SQL > shutdown abort
The ORACLE routine has been closed.
Normal SHUTDOWN IMMEDIATE can no longer shut down the database, so it can only be shut down by ABORT.
The following uses root users to execute the / etc/init.d/init.crs command:
Root@newtrade2 # / etc/init.d/init.crs stop
Shutting down Oracle Cluster Ready Services (CRS):
Dec 23, 22, 26, daemon shutting down, 03.803 | INF |
Actually hang when shutting down the cluster environment, it seems that the problem is really serious, only through reboot to solve the problem.
Root@newtrade2 # / etc/init.d/init.crs start
After the system restarts, the cluster environment is started manually, and the problem is finally solved.
If it is not a RAC environment, database restart is bound to affect the use of users, while for RAC environment, a node restart will not cause users to access the database.
Of course, things are always twofold, and the problem itself is caused by the RAC environment, and a similar situation does not occur for single-instance databases.
How to share a lot of GES information on a node in the RAC environment is here. I hope the above content can be helpful to you and learn more knowledge. If you think the article is good, you can share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.