In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
Check CRS start what is the step of the problem, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain in detail for you, people with this need can come to learn, I hope you can gain something.
10gRAC-CRS service failure caused by device files
A planned complete power outage in the IBM CSC center (initialization of host and storage) caused RAC failure on the test environment. In fault recovery, by troubleshooting problems, we learned a lot of OCR-related knowledge points that we have never paid attention to before.
System structure: 2-node Oracle 10gR2 RAC
Mainframe system: P570 AIX 5300
Storage: DS4800
After the host is restarted, 1 node of the RAC fails, the CRS service is unavailable, the attempt to restart fails, and the log of the CRS service is checked without any updates recorded by crsd.log, as shown below:
-
# crsctl start crs
Attempting to start CRS stack
The CRS stack will be started shortly
# ps-ef | grep d.bin
Root 176436 127580 0 13:39:03 pts/3 0:00 grep d.bin
# crs_stat-t
CRS-0184: Cannot communicate with the CRS daemon.
# crsctl check crs
Failure 1 contacting CSS daemon
Cannot communicate with CRS
Cannot communicate with EVM
#
-
In the process of investigation, I found that the problems emerged one after another, which completely encountered the possible failures of CRS services on the AIX platform, which made me feel sorry for the whole day without summing up.
OCR and voting disk are the most important device files for CRS services, so if we encounter problems related to CRS services, we can troubleshoot the hardware devices hdisk related to these two device files.
1. Check the OCR and voting disk device file
When the CRS service of the RAC node fails to start, the first thought is whether the OCR and Voting disk devices on each node are available and consistent.
Check whether the OCR and voting disk device files of each node of the RAC are normal, as follows:
Check the status of OCR device files, which is executed by root users
# ocrcheck
Check the status of Voting Disk device files, run by oracle users
# crsctl query css votedisk
Of course, / dev/rhdisk* is just the disk file name of the database identity, and we'd better make sure that the identified device on both nodes is the same hdisk. Use "lscfg-vl hdisk*" to check the sn number of hdisk on the AIX platform.
two。 Check the permissions and groups of OCR and voting disk device files
Before installing CRS service on AIX platform, you need to grant the following groups and permissions to OCR and voting disk device disks, respectively:
OCR device chown root:dba / dev/rhdisk_OCR
Chmod 660 / dev/rhdisk_OCR
Voting disk equipment
Chown oracle:dba / dev/rdisk_votedisk
Chmod 660 / dev/rhdis_votedisk
When troubleshooting, the read and write permissions of the device are easily ignored. One of the reasons I encounter problems is that the OCR service is abnormal due to the improper group and read and write permissions of CRS devices. The error record is as follows:
-
Failed node OCR device
Oracle@clostb1/oracle > ls-la / dev/rhdisk22
Crw- 1 root system 20, 23 May 28 11:44 / dev/rhdisk22
Normal node OCR device
Oracle@clostb2#] ls-la / dev/rhdisk22
Crw-r- 1 root dba 36, 23 May 28 14:57 / dev/rhdisk22
-
After the root user modifies the permissions and groups of the failed node OCR device, restart the CRS service, which is normal.
3. Check the MPIO property of the OCR and voting disk device
In the recovery of the failed node, it is found that after the CRS service of the failed node is repaired by modifying the combined permissions of the device, the CRS service of the other node is abnormal, as shown below:
-
# crs_stat-t
CRS-0184: Cannot communicate with the CRS daemon.
#
# ocrcheck
PROT-602: Failed to retrieve data from the cluster registry
#
[oracle@clostb2#] crsctl query css votedisk
OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [Invalid argument] [22]
[oracle@clostb2#]
-
Checking the OCR device file, it is found that the MPIO attribute of the same disk is inconsistent between the two nodes, and the reserve_policy property of the failed node OCR device is set to single_path, which is the cause of the CRS service failure.
There are explanations and commands for setting the MPIO properties of OCR and voting disk devices on metlink, as follows:
-
To allow concurrent IO access to this disk device and prevent the device driver from locking the hdisks with a reservation on open, a no reservation flag must be set. Use the following chdev command to disable this reservation.
All MPIO-capable (ESS, DS8000, DS6000 devices):
Chdev-l hdiskn-a reserve_policy=no_reserve
Chdev-l hdiskm-a reserve_policy=no_reserve
For EMC (Symettrix & Clariion), HDS, IBM DS4000, and non-MPIO capable devices, perform. The following:
Chdev-l hdiskn-a reserve_lock=no
Chdev-l hdiskm-a reserve_lock=no
-
Use the "lsattr-El hdiskN" command on the AIX platform to check the properties of the hdisk device. When the reserve_policy is changed to no_reserve, the CRS service of the failed node returns to normal. The command is as follows:
# chdev-l hdisk22-a reserve_policy=no_reserve
Note: check the disks used by voting disk and ASM and set reserve_policy=no_reserve.
4. Check the configuration file ocr.loc of the OCR device
The ocr.loc file is established during the execution of the root.sh script when installing the CRS service. It is generally stored in the / etc/oracle/ path, and mainly records the information of the ocr device when the crs service is started, as follows:
-
# ls-trl / etc/oracle/ocr.loc
-rw-r--r-- 1 root dba 45 Apr 08 14:16 / etc/oracle/ocr.loc
# cat / etc/oracle/ocr.loc
Ocrconfig_loc=/dev/rhdisk22
Local_only=FALSE
#
-
If the disk set by ocr.loc does not match the actual disk, or if the file is emptied, it will cause the CRS service failure, and the log will record the error that the OCR device cannot be accessed. I have encountered the problem that the file was emptied and the CRS service could not be started, which took me a long time to discover.
Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.