Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

A node in RAC cannot start the resolution step

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

Problem: a set of two-node RAC clusters, in which node 2 cluster cannot be started, ohas processes have been started, but CRS and CSS processes have not been started

1. Check it first.

Check the alert log in RAC and find that the following error has been reported in the log:

File rotation terminated. Log file: "/ app/11.2.0.4/grid/log/uatdb02/client/olsnodes.log"

You can see from the log that the owner of olsnodes.l03 is not oracle.

After checking it, it is found that the user owner has a problem with the group.

Modify file owners and groups

Chown root:root olsnodes.l03

After modification, the olsnodes.log can be cut normally, and the alert log no longer reports this error. But the error report will stop. How can I continue to look up without a log?

2. Check some configurations of RAC. As RAC can only be built successfully according to documents, there is not much experience in problem handling. You can only look up documents and MOS while looking up problems.

Check the ASM configuration of RAC first

I found that it was different from the previously built RAC to view the disk path output, so I thought of the RAC built with asmlib.

Rpm-qa | grpe asm found that relevant packages were indeed installed, indicating that the inference direction is correct.

In the past, the ASM shared disk of RAC is configured by UDEV, but the asmlib method has not been used. So from the Internet to find the relevant posts to view, and found some commands.

Oracleasm scandisks

Oracleasm listdisks

But knowing these commands still doesn't solve the problem.

Try to execute the above command to see the output

Found a prompt for permission denied on OCR1 in scandisks

So I feel that there is a direction. Is it caused by the permission access of the shared disk?

Find the directory of the shared disk / dev/oracleasm/disks view

Node 2 owner and group are both root. If there is a problem with permissions, what should be the correct permissions? fortunately, Node 1 is still running normally. Check the shared disk of Node 1.

From the above view, the owner of the normal node is grid and the genus group is asmadmin. Is it because of this that the cluster cannot be started?

Try to manually modify the owner and group of node 2

Changes to permissions have been omitted here.

After modification, try to restart the cluster of node 2

Crsctl start crs

The error indicates that ohas has started and crs failed to start.

Read the crsctl start instructions carefully and find that start crs is the command to start OHAS and start cluster is the command to start CRS. It seems that the understanding of RAC is not deep enough.

Try to start the Node 2 cluster again

It is indicated that cssd failed to start. You can also see the following error from the alert log

So check ocss.log.

The error output from the ocss log shows that the vote disk cannot be found, and the cluster cannot be started naturally.

But the reason why the disk can not be found is not known. So help MOS, but look up more than a dozen documents with related keywords, and the problem here is not very similar. There is no available solution.

So the problem fell into a stalemate.

It's time to get off work, and go on with it the next day.

Log on to the machine the next day and try crsctl start cluster again (knowing that you don't have it, but still want to try it.)

Sure enough, I didn't let me down, but I still couldn't find vote disk by mistake.

Check the MOS document and find some related commands.

Crsctl query css votedisk

There is no output in node 2, node 1 has.

Kfod status=TRUE asm_diskstring='/dev/oracleasm/disks/*' disks=ALL

Node 1 will return soon.

But Node 2 will hang.

And you can see the shared disk of node 2, and the permissions have changed back to root, so you can modify it manually again (you can't manually modify this every time you restart the machine).

Execute the above command again

There was a false report.

It suddenly occurred to me whether it was caused by the different configuration of the asmlib of the two nodes. Because I was not familiar with asmlib, what I saw from the online posts was the configuration of asmlib, and I didn't know how to check the configuration.

An attempt was made to execute oracleasm configure, and the configuration confirmation of the two nodes was found to be inconsistent.

There is a problem with ORACLEASM_UID and GID configuration

So modify it.

View statu

The modification is completed, and there is no prompt for permission denied in scandisks. There should be no problem.

Try to start crs again

Crsctl start cluster

Started successfully

At this point, the cluster problem is solved. However, from the state point of view, there are still many problems in this set of RAC clusters.

Summary: deepen the understanding of RAC and the role of each component. The meaning of each command.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report