What if a large amount of GES information appears on one node in RAC environment 07/04 Update SLTechnology News&Howtos

What if a large amount of GES information appears on one node in RAC environment

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article introduces how to do a lot of GES information on a node in the RAC environment. The content is very detailed. Interested friends can use it for reference. I hope it will be helpful to you.

For the RAC environment, it is not surprising that resource preemption occurs between two nodes, but if the type of information occurs regularly and frequently, it means something wrong.

This article continues to solve the problem.

A large amount of GES information appears on one node in RAC environment.

After the session that locked the resource was deleted, the problem was not completely resolved. Soon the M000 process was restarted in the background and came to a dead end again:

SQL > SELECT SID, TYPE, ID1, ID2, LMODE, REQUEST, CTIME, BLOCK

2 FROM V$TRANSACTION_ENQUEUE

SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK

245 TX 1048582 120892 6 0 3 2

265 TX 786453 122545 6 0 332 2

520 TX 917569 123789 6 0 2568 2

SQL > SELECT SID, SERIAL#, STATUS, MODULE, ACTION

2 FROM V$SESSION

3 WHERE SID IN (265,520)

SID SERIAL# STATUS MODULE ACTION

265 24021 ACTIVE MMON_SLAVE Auto-Flush Slave Action

520 20179 ACTIVE MMON_SLAVE Auto-DBFUS Action

SQL > SELECT SID, EVENT, P1TEXT, P1, P2TEXT, P2, P3TEXT, P3, SECONDS_IN_WAIT

2 FROM V$SESSION_WAIT

3 WHERE SID IN (265,520)

SID EVENT P1TEXT P1 P2TEXT P2 P3TEXT P3 SECONDS_IN_WAIT

265 gc current request file# 4 block# 31685 id# 33619976 447

520 db file sequential read file# 4 block# 2486 blocks 1 2680

Obviously, the problem has not really been solved. It seems that the problem is related to the CLUSTER environment. Check the log information of the CLUSTER system:

Bash-3.00$ cd $ORACLE_HOME/../crs/log/newtrade2

Bash-3.00$ cd cssd/

Bash-3.00$ tail-500 ocssd.log

[CSSD] 2009-12-19 1919 clssgmClientConnectMsg: clssgmClientConnectMsg: Connect from con (100b6d400) proc (100b70510) pid () proto

[CSSD] 2009-12-19 19 19 Connect from con Connect from con 47.155 [11] > TRACE: clssgmClientConnectMsg: Connect from con (100b51f00) proc (100b6ad60) pid () proto

[CSSD] 2009-12-19 19 19 clssgmClientConnectMsg 47.237 [11] > TRACE: clssgmClientConnectMsg: Connect from con (100b3a100) proc (100babc70) pid () proto

[CSSD] 2009-12-19 19 Connect from con Connect from con 47.477 [11] > TRACE: clssgmClientConnectMsg: Connect from con (100ba2f80) proc (100bac710) pid () proto

[CSSD] 2009-12-19 19 Connect from con Connect from con 47.568 [11] > TRACE: Connect from con (100b3a360) proc (100b750f0) pid () proto

[CSSD] 2009-12-19 19 Connect from con Connect from con 47.568 [11] > TRACE: Connect from con (100b6e3c0) proc (100bb37c0) pid () proto

[CSSD] 2009-12-19 19 Connect from con Connect from con 47.568 [11] > TRACE: Connect from con (100ba9990) proc (100ba7590) pid () proto

[CSSD] 2009-12-19 19 19 Connect from con Connect from con 47.584 [11] > TRACE: Connect from con (100bb0450) proc (100bb0ed0) pid () proto

[CSSD] 2009-12-19 1923 clssnmPollingThread: clssnmPollingThread: node newtrade1 (1) missed (2) checkin (s)

[CSSD] 2009-12-19 1919 27 CSSD 16.680 [15] > TRACE: clssnmPollingThread: node newtrade2 (2) missed (2) checkin (s)

[CSSD] 2009-12-19 1949 clssnmPollingThread clssnmPollingThread: node newtrade1 (1) missed (2) checkin (s)

[CSSD] 2009-12-19 1919 TRACE: clssnmPollingThread: node newtrade2 (2) missed (2) checkin (s)

[CSSD] 2009-12-23 15 clssnmPollingThread 22 missed 01.500 [15] > clssnmPollingThread: node newtrade2 (2) missed (2) checkin (s)

[CSSD] 2009-12-23 15 15 clssnmPollingThread 22 clssnmPollingThread 29.780 [15] > node newtrade2: node newtrade2 (2) missed (2) checkin (s)

When the lock information was queried in the previous article, the transaction for the locked session began on the 19th, and you can see from the log here that the output information has changed since 19:23 on the 19th. This shows that the problem was introduced at this point in time.

Continue to monitor evmd logs:

Bash-3.00$ cd.. / evmd/

Bash-3.00$ tail-100 evmdOUT.log

2009-12-19 19:16:27 Read failed for subscriber object 101c01b40

2009-12-19 19:16:30 Read failed for subscriber object 101c01b40

2009-12-19 19:17:00 Read failed for subscriber object 101c01b40

2009-12-19 19:17:13 Read failed for subscriber object 101c01b40

2009-12-19 19:17:33 Read failed for subscriber object 101c01b40

2009-12-19 19:19:43 Read failed for subscriber object 101c01b40

2009-12-19 19:19:44 Read failed for subscriber object 101c01b40

2009-12-19 19:19:45 Read failed for subscriber object 101c01b40

2009-12-19 19:19:46 Read failed for subscriber object 101c01b40

2009-12-19 19:28:47 Read failed for subscriber object 101c01b40

There was no new error message this time, but the error message that had been entered disappeared after the same point in time, obviously due to a problem with the CLUSTER environment.

I intend to monitor the status of each component of the CLUSTER, but it is found that there is no response after the current node executes crs_stat, but it is aborted by me after the waiting time exceeds 2 hours.

There is no problem running on another node, and it shows that the problem node is normal:

Bash-3.00 $. / crs_stat-t

Name type target status host

Ora....rade.db application ONLINE ONLINE newtrade2

Ora....e1.inst application ONLINE ONLINE newtrade1

Ora....e2.inst application ONLINE ONLINE newtrade2

Ora....E1.lsnr application ONLINE OFFLINE

Ora....de1.gsd application ONLINE ONLINE newtrade1

Ora....de1.ons application ONLINE ONLINE newtrade1

Ora....de1.vip application ONLINE ONLINE newtrade1

Ora....E2.lsnr application ONLINE ONLINE newtrade2

Ora....de2.gsd application ONLINE ONLINE newtrade2

Ora....de2.ons application ONLINE ONLINE newtrade2

Ora....de2.vip application ONLINE ONLINE newtrade2

Check node 2 and find that there are multiple RACGMAIN processes:

Bash-3.00$ ps-ef | grep / data

Root 10854 10 Sep 30? 670:09 / data/oracle/product/10.2/crs/bin/crsd.bin reboot

Oracle 10839 10 Sep 30? 0:00 sh-c sh-c 'ulimit-c unlimited; cd / data/oracle/product/10.2/crs/log/newtrade

Oracle 14657 14326 0 Sep 30? 2:45 / data/oracle/product/10.2/crs/bin/evmlogger.bin-o / data/oracle/product/10.2/cr

Oracle 14417 14266 0 Sep 30? 0:00 sh-c / bin/sh-c 'ulimit-c unlimited; cd / data/oracle/product/10.2/crs/log/new

Oracle 14418 14417 0 Sep 30? 0:00 / bin/sh-c ulimit-c unlimited; cd / data/oracle/product/10.2/crs/log/newtrade2/

Oracle 14419 14418 0 Sep 30? 1233:07 / data/oracle/product/10.2/crs/bin/ocssd.bin

Oracle 14326 10839 0 Sep 30? 103:58 / data/oracle/product/10.2/crs/bin/evmd.bin

Root 14316 14265 0 Sep 30? 32:00 / data/oracle/product/10.2/crs/bin/oprocd run-t 1000-m 500-f

Oracle 15625 10 Oct 06? 0:00 / data/oracle/product/10.2/crs/opmn/bin/ons-d

Oracle 15627 15625 0 Oct 06? 20:23 / data/oracle/product/10.2/crs/opmn/bin/ons-d

Oracle 16028 10 Sep 30? 378:38 / data/oracle/product/10.2/database/bin/racgimon startd newtrade

Root 17341 10 Dec 19? 0:00 / data/oracle/product/10.2/crs/bin/racgmain check

Oracle 17311 10 Dec 19? 0:00 / data/oracle/product/10.2/crs/bin/racgmain check

Oracle 17331 10 Dec 19? 0:00 / data/oracle/product/10.2/crs/bin/racgmain check

Oracle 17335 10 Dec 19? 0:00 / data/oracle/product/10.2/crs/bin/racgmain check

Oracle 21094 10 Jul 27? 11:42 / data/oracle/product/10.2/database/bin/tnslsnr LISTENER-inherit

Oracle 17359 10 Dec 19? 0:00 / data/oracle/product/10.2/crs/bin/racgmain check

Oracle 9866 24535 0 21:51:47 pts/1 0:00 grep / data

Oracle 17267 10 Dec 19? 0:00 / data/oracle/product/10.2/database/bin/racgmain check

Oracle 17353 10 Dec 19? 0:00 / data/oracle/product/10.2/crs/bin/racgmain check

On node 1, such a process does not exist:

Bash-3.00$ ps-ef | grep / data

Oracle 2150 10 Aug 11? 0:00 sh-c sh-c 'ulimit-c unlimited; cd / data/oracle/product/10.2/crs/log/newtrade

Oracle 6277 10 Aug 12? 8:05 / data/oracle/product/10.2/database/bin/tnslsnr LISTENER-inherit

Oracle 4294 4293 0 Aug 11? 0:00 / bin/sh-c ulimit-c unlimited; cd / data/oracle/product/10.2/crs/log/newtrade1/

Root 2153 10 Aug 11? 203:08 / data/oracle/product/10.2/crs/bin/crsd.bin reboot

Oracle 4293 4150 0 Aug 11? 0:00 sh-c / bin/sh-c 'ulimit-c unlimited; cd / data/oracle/product/10.2/crs/log/new

Oracle 4887 4885 0 Aug 11? 5:16 / data/oracle/product/10.2/crs/opmn/bin/ons-d

Oracle 4175 2150 0 Aug 11? 29:00 / data/oracle/product/10.2/crs/bin/evmd.bin

Oracle 4529 4175 0 Aug 11? 0:46 / data/oracle/product/10.2/crs/bin/evmlogger.bin-o / data/oracle/product/10.2/cr

Oracle 4295 4294 0 Aug 11? 284:08 / data/oracle/product/10.2/crs/bin/ocssd.bin

Oracle 9739 10 Aug 11? 110:18 / data/oracle/product/10.2/database/bin/racgimon startd newtrade

Oracle 4885 10 Aug 11? 0:00 / data/oracle/product/10.2/crs/opmn/bin/ons-d

Root 4228 4149 0 Aug 11? 7:30 / data/oracle/product/10.2/crs/bin/oprocd run-t 1000-m 500-f

Oracle 23064 15394 0 21:49:20 pts/1 0:00 grep / data

Moreover, the start-up time of these processes is exactly the time when the problem occurred, which is very suspicious.

Clearing these processes has no impact on the operation of the RAC database, try to kill these processes:

Bash-3.00$ kill-9 17311 17331 17335 17359 17267 17353

Switch to the racgmain process started by the root antivirus root user.

Root@newtrade2 # kill-9 17341

Unfortunately, after killing these processes, the problem has not been completely solved.

It seems that the only way is to completely restart the CLUSTER environment of the problem node.

Bash-3.00$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.3.0-Production on Wednesday December 23 22:24:34 2009

Connected.

SQL > shutdown abort

The ORACLE routine has been closed.

Normal SHUTDOWN IMMEDIATE can no longer shut down the database, so it can only be shut down by ABORT.

The following uses root users to execute the / etc/init.d/init.crs command:

Root@newtrade2 # / etc/init.d/init.crs stop

Shutting down Oracle Cluster Ready Services (CRS):

Dec 23, 22, 26, daemon shutting down, 03.803 | INF |

Actually hang when shutting down the cluster environment, it seems that the problem is really serious, only through reboot to solve the problem.

Root@newtrade2 # / etc/init.d/init.crs start

After the system restarts, the cluster environment is started manually, and the problem is finally solved.

If it is not a RAC environment, database restart is bound to affect the use of users, while for RAC environment, a node restart will not cause users to access the database.

Of course, things are always twofold, and the problem itself is caused by the RAC environment, and a similar situation does not occur for single-instance databases.

How to share a lot of GES information on a node in the RAC environment is here. I hope the above content can be helpful to you and learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.