What if terminating the instance due to error481 causes ASM to fail to start? 04/10 Update SLTechnology News&Howtos

What if terminating the instance due to error481 causes ASM to fail to start?

2025-04-10 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

Editor to share with you what to do when terminating the instance due to error481 causes ASM to fail to start, I believe most people don't know much about it, so share this article for your reference. I hope you can learn a lot after reading this article. Let's learn about it together.

1. Phenomenon

Oracle 11g shuts down two nodes and moves the hardware.

At the same time, after booting up, Node 1 is normal, Node 2 can start the ASM instance, but soon the ASM instance dies and the CRS service cannot be started.

[oracle@shwmsdb1 ~] $ps-ef | grep pmon

Grid 14309 1 0 03:05? 00:00:01 asm_pmon_+ASM1

Oracle 14382 14328 0 08:18 pts/1 00:00:00 grep pmon

Oracle 15720 1 0 03:19? 00:00:06 ora_pmon_shwmsdb1

[oracle@shwmsdb2 ~] $ps-ef | grep pmon

Oracle 19298 19265 0 08:19 pts/1 00:00:00 grep pmon

2. Analyze the reasons.

Node 2 starts the database instance and reports an error:

SQL > startup nomount

ORA-01078: failure in processing system parameters

ORA-01565: error in identifying file'+ DATA/shwmsdb/spfileshwmsdb.ora'

ORA-17503: ksfdopn:2 Failed to open file + DATA/shwmsdb/spfileshwmsdb.ora

ORA-15077: could not locate ASM instance serving a required diskgroup

View the ASM alarm log:

Node 2 reports an error when starting up:

Fri Oct 27 03:43:07 2017

LMS0 started with pid=11, OS id=15250 at elevated priority

Fri Oct 27 03:43:07 2017

LMHB started with pid=12, OS id=15256

Fri Oct 27 03:43:07 2017

MMAN started with pid=13, OS id=15260

Fri Oct 27 03:43:07 2017

DBW0 started with pid=14, OS id=15264

Fri Oct 27 03:43:07 2017

LGWR started with pid=15, OS id=15268

Fri Oct 27 03:43:07 2017

CKPT started with pid=16, OS id=15272

Fri Oct 27 03:43:07 2017

SMON started with pid=17, OS id=15276

Fri Oct 27 03:43:07 2017

RBAL started with pid=18, OS id=15280

Fri Oct 27 03:43:07 2017

GMON started with pid=19, OS id=15284

Fri Oct 27 03:43:07 2017

MMON started with pid=20, OS id=15288

Fri Oct 27 03:43:07 2017

MMNL started with pid=21, OS id=15292

Lmon registered with NM-instance number 2 (internal mem no 1)

Fri Oct 27 03:45:07 2017

PMON (ospid: 15212): terminating the instance due to error

Fri Oct 27 03:45:07 2017

ORA-1092: opitsk aborting process

Fri Oct 27 03:45:07 2017

System state dump requested by (instance=2, osid=15212 (PMON)), summary= [abnormal instance termination].

System State dumped to trace file / u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_diag_15230.trc

Dumping diagnostic data in directory= [CDMP _ 20171027034507], requested by (instance=2, osid=15212 (PMON)), summary= [abnormal instance termination].

Fri Oct 27 03:45:07 2017

ORA-1092: opitsk aborting process

Fri Oct 27 03:45:07 2017

License high water mark = 1

Instance terminated by PMON, pid = 15212

USER (ospid: 15331): terminating the instance

Instance terminated by USER, pid = 15331

ASM trc log:

/ u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_diag_15230.trc

Reconfiguration starts [incarn=0]

* 2017-10-27 03 Fraser 4315 06.954

I'm the voting node

Group reconfiguration cleanup

Kjzdattdlm: Can not attach to DLM (LMON up= [TRUE], DB mounted= [FALSE]).

* 2017-10-27 03 purl 4315 08.186

Kjzdattdlm: Can not attach to DLM (LMON up= [TRUE], DB mounted= [FALSE]).

ASM alarm log for Node 1:

LMON (ospid: 14339) detects hung instances during IMR reconfiguration

LMON (ospid: 14339) tries to kill the instance 2 in 37 seconds.

Please check instance 2s alert log and LMON trace file for more details.

Fri Oct 27 03:45:04 2017

Remote instance kill is issued with system inc 10

Remote instance kill map (size 1): 2

LMON received an instance eviction notification from instance 1

The instance eviction reason is 0x20000000

The instance eviction map is 2

Reconfiguration started (old inc 10, new inc 12)

[root@shwmsdb1 ~] # netstat-rn

Kernel IP routing table

Destination Gateway Genmask Flags MSS Window irtt Iface

192.168.123.0 0.0.0.0 255.255.255.0 U 0 0 0 eth2

10.0.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0

0.0.0.0 192.168.123.254 0.0.0.0 UG 0 0 0 eth2

[root@shwmsdb2 ~] # netstat-rn

Kernel IP routing table

Destination Gateway Genmask Flags MSS Window irtt Iface

192.168.123.0 0.0.0.0 255.255.255.0 U 0 0 0 eth2

10.0.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0

169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth2

0.0.0.0 192.168.123.254 0.0.0.0 UG 0 0 0 eth2

A routing information is missing on node 1.

It shows that the host network card USB0 is dynamically obtaining the IP address of the 169.254.XX.XX network segment.

IBM's PC server uses USB0 as a feature to manage the network. When you are not connected to the USB0 network card, you will constantly apply for IP from DHCP. If you do not find DHCP, you will assign an IP address of 169.254.xxx.xxx by default, which will conflict with the HAIP of ORACLE, resulting in the loss of routing information.

Through the comparison of all kinds of log information and the information in the document, we know that the fault phenomenon is consistent with the fault phenomenon in the document.

3. Solution

Add the missing routing information to node 1.

Execute the following as root on the node that's missing HAIP route:

# route add-net 169.254.0.0 netmask 255.255.0.0 dev eth2

Execute the following statement on node 2:

Start ora.crsd as root on the node that's partial up:

# $GRID_HOME/bin/crsctl start res ora.crsd-init

Grid execution path: PATH=$PATH:$HOME/bin:/u01/app/11.2.0/grid/bin

The CRS of Node 2 starts normally.

Execute the following statement on node 1:

The other workaround is to restart GI on the node that's missing HAIP route with "crsctl stop crs-f" and "crsctl start crs" command as root.

[root@shwmsdb2 bin] #. / crsctl stop crs-f

It's stuck.

End with Ctrl+C.

However, the CRS of Node 1 has been abnormal.

Use ps-ef | grep grid to see that node 1 has a grid stuck process and kill dropped the process.

Kill-9 31307

Only the normal grid process is left between the two nodes.

Shut down the crs service on both nodes.

Crsctl stop crs

Shut down normally.

Open the crs service of the two nodes.

Crsctl start crs

Execute when it is enabled:

Ps-ef | grep grid

Ps-ef | grep oracle

Crsctl stat res-t

It all looks normal.

Crs_stat-t on both sides is also normal.

[grid@shwmsdb2] $crs_stat-t

Name Type Target State Host

Ora.CRS.dg ora....up.type ONLINE ONLINE shwmsdb1

Ora.DATA.dg ora....up.type ONLINE ONLINE shwmsdb1

Ora.FRA.dg ora....up.type ONLINE ONLINE shwmsdb1

Ora....ER.lsnr ora....er.type ONLINE ONLINE shwmsdb1

Ora....N1.lsnr ora....er.type ONLINE ONLINE shwmsdb2

Ora.asm ora.asm.type ONLINE ONLINE shwmsdb1

Ora.cvu ora.cvu.type ONLINE ONLINE shwmsdb2

Ora....network ora....rk.type ONLINE ONLINE shwmsdb1

Ora.oc4j ora.oc4j.type ONLINE ONLINE shwmsdb2

Ora.ons ora.ons.type ONLINE ONLINE shwmsdb1

Ora....ry.acfs ora....fs.type ONLINE ONLINE shwmsdb1

Ora.scan1.vip ora....ip.type ONLINE ONLINE shwmsdb2

Ora.shwmsdb.db ora....se.type ONLINE ONLINE shwmsdb1

Ora....SM1.asm application ONLINE ONLINE shwmsdb1

Ora....B1.lsnr application ONLINE ONLINE shwmsdb1

Ora....db1.ons application ONLINE ONLINE shwmsdb1

Ora....db1.vip ora....t1.type ONLINE ONLINE shwmsdb1

Ora....SM2.asm application ONLINE ONLINE shwmsdb2

Ora....B2.lsnr application ONLINE ONLINE shwmsdb2

Ora....db2.ons application ONLINE ONLINE shwmsdb2

Ora....db2.vip ora....t1.type ONLINE ONLINE shwmsdb2

So far, the crs and asm of both nodes are normal.

4. Fault summary

IBM's x3850 x5 series PC Server has the defect that USB turns on the dhcp function, which leads to the defect that the usb network card may occupy HAIP. The RAC database environment running on such machines in the production environment needs to turn off the automatic dhcp acquisition function of USB0 and configure the USB0 with static IP.

Plan to delete USB0 on both nodes.

[root@shwmsdb1 ~] # / sbin/ifdown usb0

[root@shwmsdb1 ~] # cd / etc/sysconfig/network-scripts

[root@shwmsdb1 network-scripts] # cat ifcfg-usb0

# IBM RNDIS/CDC ETHER

DEVICE=usb0

BOOTPROTO=dhcp

ONBOOT=no

HWADDR=5e:f3:fd:35:86:33

[root@shwmsdb1 network-scripts] # mv ifcfg-usb0 ifcfg-usb0.bak

[root@shwmsdb1 network-scripts] # ls

Ifcfg-eth0 ifdown-bnep ifdown-isdn ifdown-sl ifup-eth ifup-ipx ifup-ppp ifup-wireless

Ifcfg-eth2 ifdown-eth ifdown-post ifdown-tunnel ifup-ib ifup-isdn ifup-routes init.ipv6-global

Ifcfg-lo ifdown-ippp ifdown-ppp ifup ifup-ippp ifup-plip ifup-sit net.hotplug

Ifcfg-usb0.bak ifdown-ipsec ifdown-routes ifup-aliases ifup-ipsec ifup-plusb ifup-sl network-functions

Ifdown ifdown-ipv6 ifdown-sit ifup-bnep ifup-ipv6 ifup-post ifup-tunnel network-functions-ipv6

[root@shwmsdb1 network-scripts] # ifconfig-a

Eth0 Link encap:Ethernet HWaddr 5C:F3:FC:DA:86:80

Inet addr:10.0.0.89 Bcast:10.0.255.255 Mask:255.255.0.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:65714 errors:0 dropped:0 overruns:0 frame:0

TX packets:15916 errors:0 dropped:0 overruns:0 carrier:0

Collisions:0 txqueuelen:1000

RX bytes:5327553 (5.0 MiB) TX bytes:1627321 (1.5 MiB)

Interrupt:169 Memory:92000000-92012800

Eth0:2 Link encap:Ethernet HWaddr 5C:F3:FC:DA:86:80

Inet addr:10.0.0.90 Bcast:10.0.255.255 Mask:255.255.0.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

Interrupt:169 Memory:92000000-92012800

Eth0:3 Link encap:Ethernet HWaddr 5C:F3:FC:DA:86:80

Inet addr:10.0.0.100 Bcast:10.0.255.255 Mask:255.255.0.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

Interrupt:169 Memory:92000000-92012800

Eth2 Link encap:Ethernet HWaddr 5C:F3:FC:DA:86:82

Inet addr:192.168.123.1 Bcast:192.168.123.255 Mask:255.255.255.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:1536228 errors:0 dropped:0 overruns:0 frame:0

TX packets:1539186 errors:0 dropped:0 overruns:0 carrier:0

Collisions:0 txqueuelen:1000

RX bytes:729154172 (695.3 MiB) TX bytes:801250137 (764.1 MiB)

Interrupt:217 Memory:94000000-94012800

Eth2:1 Link encap:Ethernet HWaddr 5C:F3:FC:DA:86:82

Inet addr:169.254.66.26 Bcast:169.254.255.255 Mask:255.255.0.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

Interrupt:217 Memory:94000000-94012800

Lo Link encap:Local Loopback

Inet addr:127.0.0.1 Mask:255.0.0.0

UP LOOPBACK RUNNING MTU:16436 Metric:1

RX packets:529225 errors:0 dropped:0 overruns:0 frame:0

TX packets:529225 errors:0 dropped:0 overruns:0 carrier:0

Collisions:0 txqueuelen:0

RX bytes:137382526 (131.0 MiB) TX bytes:137382526 (131.0 MiB)

Usb0 Link encap:Ethernet HWaddr 5E:F3:FD:35:86:33

BROADCAST MULTICAST MTU:1500 Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0

TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

Collisions:0 txqueuelen:1000

RX bytes:0 (0.0b) TX bytes:0 (0.0b)

There will be no usb0 in ifconfig-an after restarting the server.

Problem solved.

The above is all the contents of the article "what to do if terminating the instance due to error481 causes ASM to fail to start up". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.