[fault handling] A RAC fault handling process 07/06 Update SLTechnology News&Howtos

[fault handling] A RAC fault handling process

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

[fault handling] A RAC fault handling process

1.1 introduction project of fault environment

Source db

Db Typ

2-node RAC

Db version

11.2.0.1.0

Db storage

ASM

OS version and kernel version

RHEL 6.6

1.2 after 10: 00 p.m., a netizen asked me to help deal with the problem that RAC could not be started due to downtime, and told me about multipath and storage. Wheat seedlings do not know much about storage, do not have much contact with multi-paths, and have not studied this thing themselves. Now that you have found me, you can't ignore it and go up there to have a look. The result was miserable. I worked for N hours and asked for help from N people. I finally got it at noon the next day. Fortunately, the next day was the weekend and I didn't have to go to work. Wheat seedlings record the process of treatment. I hope my process can help more people.

At the beginning, the css of node 1 could not start, a lot of errors were reported, and the ha of node 2 could not start properly. Error I forgot to record, anyway, it is a variety of research logs, all kinds of check MOS, all kinds of Baidu, all kinds of Google, including OCR restore have tried, and finally there is no way, only to use personal commonly used tricks, that is. Re-execute the root.sh script.

I have mentioned the execution of this script many times in my personal blog. However, we still need to practice more, because there are a lot of points for attention. First, if you want to keep the disk group from being deleted, you can add the-keepdg option to the unmount command ($ORACLE_HOME/crs/install/rootcrs.pl-deconfig-force-verbose), but 11.2.0.1 does not. When uninstalling on the second node, you can retain as much information as possible without adding-lastnode.

Fortunately, after the first execution of wheat seedlings, the cluster can start normally, everything is fine, from 10:00 to 1 o'clock. As a result, when preparing to import the backup of OCR, you need to start CRS in exec mode, but the result is sad again, and the cluster is broken. There is no way, but to restart, restart more sad, OCR disk can not be found. Wheat seedlings want to give up. I can't find the disk, and I can't help it. We have to find someone who knows how to store it. It's almost two o'clock. Well, it's time to get some rest.

After 8: 00 in the morning, I quickly logged on teamviewer and continued to deal with it. First of all, we have been messing with multi-paths for half a day. It turned out that there was a problem with the multipath software of the second node, so I reinstalled it myself. I expect to see the disk after installation, but it still doesn't work. Helplessly, look for a master who understands storage in the group of leshami. Boss Xiao helped me look at the storage and found the disk. Thank you very much.

Then continue with the restore operation, continue with deconfig, and then root.sh. After the implementation of root.sh, I found that the cluster was normal, and I tried to restart the host. Everything was normal. It seems that the storage is messed up. Then continue to restore the database, this is the key point. As the whole operation is careful not to touch the non-OCR disk, for fear of losing the data, because there is no backup of 10T of data, I am also drunk. Use kfod to take a look at the disk, everything is fine, all right, then directly MOUNT the disk group. After re-executing root.sh, as long as the disk file of the disk group is not corrupted, it can be directly MOUNT up. This is also a way to restore OCR without backup.

Then everything went well, such as configuring monitoring, adding DB to the srvctl manager, etc., which is really blessed by Buddha. Many processing logs are not recorded, so only a few scripts can be given here.

1.2.1 some scripts used in the process to re-execute the root.sh script need to pay special attention to whether the database data is placed on the OCR disk group. If you put it on an OCR disk group, remember that you cannot execute the script at will.

1 and 2 nodes execute deconfig respectively:

Export ORACLE_HOME=/u01/app/11.2.0/grid

Export PATH=$PATH:$ORACLE_HOME/bin

$ORACLE_HOME/crs/install/rootcrs.pl-deconfig-force-verbose

2. After the execution, the OCR disk needs to be executed by dd,2 all nodes:

Dd if=/dev/zero of=/dev/oracleasm/disks/OCR_VOL2 bs=1024k count=1024

Dd if=/dev/zero of=/dev/oracleasm/disks/OCR_VOL1 bs=1024k count=1024

3. After the execution of node 1, it will be executed in node 2:

Export ORACLE_HOME=/u01/app/11.2.0/grid

$ORACLE_HOME/root.sh

In addition, there is a common bug error in executing root.sh for version 11.2.0.1:

CRS-4124: Oracle High Availability Services startup failed.

CRS-4000: Command Start failed, or completed with errors.

Ohasd failed to start: Inappropriate ioctl for device

Ohasd failed to start: Inappropriate ioctl for device at / u01/app/11.2.0/grid/crs/install/roothas.pl line 296.

The solution to this mistake is:

Is to execute the following command before executing root.sh

/ bin/dd if=/var/tmp/.oracle/npohasd of=/dev/null bs=1024 count=1

If it appears

/ bin/dd: opening` / var/tmp/.oracle/npohasd': No such file or directory

When the file indicates that the relevant file has not been generated, then continue to execute until it can be executed, and generally execute the dd command when the message Adding daemon to inittab appears.

1.2.2 some configurations of root.sh 's configuration script root.sh are placed in the following script, including the name of the OCR disk to be created, disk path, and so on:

$ORACLE_HOME/crs/config/config.sh

1.2.3 kfod command this command displays all disk information:

Data01- > export ORACLE_HOME=/u01/app/11.2.0/grid

Data01- > $ORACLE_HOME/bin/kfod disk=all s=true ds=true c=true

Disk Size Header Path Disk Group User Group

1: 476837 Mb MEMBER / dev/oracleasm/disks/DATA_VOL1 DATA grid asmadmin

2: 953674 Mb MEMBER / dev/oracleasm/disks/DATA_VOL10 DATA grid asmadmin

3: 953674 Mb MEMBER / dev/oracleasm/disks/DATA_VOL11 DATA grid asmadmin

4: 953675 Mb MEMBER / dev/oracleasm/disks/DATA_VOL12 DATA grid asmadmin

5: 953674 Mb MEMBER / dev/oracleasm/disks/DATA_VOL13 DATA grid asmadmin

6: 953674 Mb MEMBER / dev/oracleasm/disks/DATA_VOL14 DATA grid asmadmin

7: 953674 Mb MEMBER / dev/oracleasm/disks/DATA_VOL15 DATA grid asmadmin

8: 953674 Mb MEMBER / dev/oracleasm/disks/DATA_VOL16 DATA grid asmadmin

9: 953675 Mb MEMBER / dev/oracleasm/disks/DATA_VOL18 DATA grid asmadmin

10: 953675 Mb MEMBER / dev/oracleasm/disks/DATA_VOL2 DATA grid asmadmin

11: 953674 Mb MEMBER / dev/oracleasm/disks/DATA_VOL3 DATA grid asmadmin

12: 953674 Mb MEMBER / dev/oracleasm/disks/DATA_VOL4 DATA grid asmadmin

13: 953675 Mb MEMBER / dev/oracleasm/disks/DATA_VOL5 DATA grid asmadmin

14: 953674 Mb MEMBER / dev/oracleasm/disks/DATA_VOL6 DATA grid asmadmin

15: 953674 Mb MEMBER / dev/oracleasm/disks/DATA_VOL7 DATA grid asmadmin

16: 953674 Mb MEMBER / dev/oracleasm/disks/DATA_VOL8 DATA grid asmadmin

17: 953675 Mb MEMBER / dev/oracleasm/disks/DATA_VOL9 DATA grid asmadmin

18: 476837 Mb MEMBER / dev/oracleasm/disks/FLASH_VOL1 FLASH grid asmadmin

19: 286103 Mb MEMBER / dev/oracleasm/disks/FLASH_VOL2 FLASH grid asmadmin

20: 286057 Mb MEMBER / dev/oracleasm/disks/OCR_VOL1 OCR grid asmadmin

21: 286102 Mb CANDIDATE / dev/oracleasm/disks/OCR_VOL2 # grid asmadmin

22: 476837 Mb MEMBER ORCL:DATA_VOL1 DATA

23: 953674 Mb MEMBER ORCL:DATA_VOL10 DATA

24: 953674 Mb MEMBER ORCL:DATA_VOL11 DATA

25: 953675 Mb MEMBER ORCL:DATA_VOL12 DATA

26: 953674 Mb MEMBER ORCL:DATA_VOL13 DATA

27: 953674 Mb MEMBER ORCL:DATA_VOL14 DATA

28: 953674 Mb MEMBER ORCL:DATA_VOL15 DATA

29: 953674 Mb MEMBER ORCL:DATA_VOL16 DATA

30: 953675 Mb MEMBER ORCL:DATA_VOL18 DATA

31: 953675 Mb MEMBER ORCL:DATA_VOL2 DATA

32: 953674 Mb MEMBER ORCL:DATA_VOL3 DATA

33: 953674 Mb MEMBER ORCL:DATA_VOL4 DATA

34: 953675 Mb MEMBER ORCL:DATA_VOL5 DATA

35: 953674 Mb MEMBER ORCL:DATA_VOL6 DATA

36: 953674 Mb MEMBER ORCL:DATA_VOL7 DATA

37: 953674 Mb MEMBER ORCL:DATA_VOL8 DATA

38: 953675 Mb MEMBER ORCL:DATA_VOL9 DATA

39: 476837 Mb MEMBER ORCL:FLASH_VOL1 FLASH

40: 286103 Mb MEMBER ORCL:FLASH_VOL2 FLASH

41: 286057 Mb MEMBER ORCL:OCR_VOL1 OCR

42: 286102 Mb CANDIDATE ORCL:OCR_VOL2 #

ORACLE_SID ORACLE_HOME HOST_NAME

+ ASM1 / u01/app/11.2.0/grid data01

+ ASM2 / u01/app/11.2.0/grid data02

Data01- >

Data01- > sqlplus / as sysasm

SQL*Plus: Release 11.2.0.1.0 Production on Sat Dec 10 12:27:25 2016

Connected to:

Oracle Database 11g Enterprise Edition Release 11.2.0.1.0-64bit Production

With the Real Application Clusters and Automatic Storage Management options

SQL >

SQL > alter diskgroup OCR ADD DISK'/ dev/oracleasm/disks/OCR_VOL2'

Diskgroup altered.

1.2.4 add db to srvctl Manager 11.2.0.1 there is no-c parameter, so remove it. You can use-h to see the specific usage:

Srvctl add database-d DGPHY-c RAC-o / oracle/app/oracle/product/11.2.0/db-p'+ DATA/TESTDGPHY/PARAMETERFILE/spfiledgphy.ora'-r primary-n TESTDG

Srvctl add instance-d DGPHY-I DGPHY1-n ZFZHLHRDB1

Srvctl add instance-d DGPHY-I DGPHY2-n ZFZHLHRDB2

Srvctl status database-d DGPHY

Srvctl start database-d TESTDG

About Me

● author: wheat seedlings, only focus on the database technology, pay more attention to the application of technology

● article is updated synchronously on itpub (http://blog.itpub.net/26736162), blog Park (http://www.cnblogs.com/lhrbest) and personal Wechat official account (xiaomaimiaolhr).

● article itpub address: http://blog.itpub.net/26736162/viewspace-2130218/

● article blog park address: http://www.cnblogs.com/lhrbest/p/6157931.html

● pdf version of this article and wheat seedling cloud disk address: http://blog.itpub.net/26736162/viewspace-1624453/

● QQ group: 230161599 WeChat group: private chat

● contact me, please add QQ friend (642808185), indicate the reason for adding

● was completed in Taixing apartment from 22:00 on 2016-12-09 to 16:00 on 2016-12-10.

The content of the ● article comes from the study notes of wheat seedlings, and some of it is sorted out from the Internet. Please forgive me if there is any infringement or improper place.

The mobile phone captain clicks the image below to identify the QR code or the Wechat client scans the following QR code to follow the Wechat official account of wheat seedlings: xiaomaimiaolhr, and learn the most practical database technology for free.

Cdn.qqmail.com/zh_CN/htmledition/p_w_picpaths/function/qm_open/ico_mailme_02.png ">

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.