In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
Editor to share with you what to do when terminating the instance due to error481 causes ASM to fail to start, I believe most people don't know much about it, so share this article for your reference. I hope you can learn a lot after reading this article. Let's learn about it together.
1. Phenomenon
Oracle 11g shuts down two nodes and moves the hardware.
At the same time, after booting up, Node 1 is normal, Node 2 can start the ASM instance, but soon the ASM instance dies and the CRS service cannot be started.
[oracle@shwmsdb1 ~] $ps-ef | grep pmon
Grid 14309 1 0 03:05? 00:00:01 asm_pmon_+ASM1
Oracle 14382 14328 0 08:18 pts/1 00:00:00 grep pmon
Oracle 15720 1 0 03:19? 00:00:06 ora_pmon_shwmsdb1
[oracle@shwmsdb2 ~] $ps-ef | grep pmon
Oracle 19298 19265 0 08:19 pts/1 00:00:00 grep pmon
2. Analyze the reasons.
Node 2 starts the database instance and reports an error:
SQL > startup nomount
ORA-01078: failure in processing system parameters
ORA-01565: error in identifying file'+ DATA/shwmsdb/spfileshwmsdb.ora'
ORA-17503: ksfdopn:2 Failed to open file + DATA/shwmsdb/spfileshwmsdb.ora
ORA-15077: could not locate ASM instance serving a required diskgroup
View the ASM alarm log:
Node 2 reports an error when starting up:
Fri Oct 27 03:43:07 2017
LMS0 started with pid=11, OS id=15250 at elevated priority
Fri Oct 27 03:43:07 2017
LMHB started with pid=12, OS id=15256
Fri Oct 27 03:43:07 2017
MMAN started with pid=13, OS id=15260
Fri Oct 27 03:43:07 2017
DBW0 started with pid=14, OS id=15264
Fri Oct 27 03:43:07 2017
LGWR started with pid=15, OS id=15268
Fri Oct 27 03:43:07 2017
CKPT started with pid=16, OS id=15272
Fri Oct 27 03:43:07 2017
SMON started with pid=17, OS id=15276
Fri Oct 27 03:43:07 2017
RBAL started with pid=18, OS id=15280
Fri Oct 27 03:43:07 2017
GMON started with pid=19, OS id=15284
Fri Oct 27 03:43:07 2017
MMON started with pid=20, OS id=15288
Fri Oct 27 03:43:07 2017
MMNL started with pid=21, OS id=15292
Lmon registered with NM-instance number 2 (internal mem no 1)
Fri Oct 27 03:45:07 2017
PMON (ospid: 15212): terminating the instance due to error
Fri Oct 27 03:45:07 2017
ORA-1092: opitsk aborting process
Fri Oct 27 03:45:07 2017
System state dump requested by (instance=2, osid=15212 (PMON)), summary= [abnormal instance termination].
System State dumped to trace file / u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_diag_15230.trc
Dumping diagnostic data in directory= [CDMP _ 20171027034507], requested by (instance=2, osid=15212 (PMON)), summary= [abnormal instance termination].
Fri Oct 27 03:45:07 2017
ORA-1092: opitsk aborting process
Fri Oct 27 03:45:07 2017
License high water mark = 1
Instance terminated by PMON, pid = 15212
USER (ospid: 15331): terminating the instance
Instance terminated by USER, pid = 15331
ASM trc log:
/ u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_diag_15230.trc
Reconfiguration starts [incarn=0]
* 2017-10-27 03 Fraser 4315 06.954
I'm the voting node
Group reconfiguration cleanup
Kjzdattdlm: Can not attach to DLM (LMON up= [TRUE], DB mounted= [FALSE]).
Kjzdattdlm: Can not attach to DLM (LMON up= [TRUE], DB mounted= [FALSE]).
Kjzdattdlm: Can not attach to DLM (LMON up= [TRUE], DB mounted= [FALSE]).
* 2017-10-27 03 purl 4315 08.186
Kjzdattdlm: Can not attach to DLM (LMON up= [TRUE], DB mounted= [FALSE]).
Kjzdattdlm: Can not attach to DLM (LMON up= [TRUE], DB mounted= [FALSE]).
Kjzdattdlm: Can not attach to DLM (LMON up= [TRUE], DB mounted= [FALSE]).
ASM alarm log for Node 1:
LMON (ospid: 14339) detects hung instances during IMR reconfiguration
LMON (ospid: 14339) tries to kill the instance 2 in 37 seconds.
Please check instance 2s alert log and LMON trace file for more details.
Fri Oct 27 03:45:04 2017
Remote instance kill is issued with system inc 10
Remote instance kill map (size 1): 2
LMON received an instance eviction notification from instance 1
The instance eviction reason is 0x20000000
The instance eviction map is 2
Reconfiguration started (old inc 10, new inc 12)
[root@shwmsdb1 ~] # netstat-rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
192.168.123.0 0.0.0.0 255.255.255.0 U 0 0 0 eth2
10.0.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0
0.0.0.0 192.168.123.254 0.0.0.0 UG 0 0 0 eth2
[root@shwmsdb2 ~] # netstat-rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
192.168.123.0 0.0.0.0 255.255.255.0 U 0 0 0 eth2
10.0.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth2
0.0.0.0 192.168.123.254 0.0.0.0 UG 0 0 0 eth2
A routing information is missing on node 1.
It shows that the host network card USB0 is dynamically obtaining the IP address of the 169.254.XX.XX network segment.
IBM's PC server uses USB0 as a feature to manage the network. When you are not connected to the USB0 network card, you will constantly apply for IP from DHCP. If you do not find DHCP, you will assign an IP address of 169.254.xxx.xxx by default, which will conflict with the HAIP of ORACLE, resulting in the loss of routing information.
Through the comparison of all kinds of log information and the information in the document, we know that the fault phenomenon is consistent with the fault phenomenon in the document.
3. Solution
Add the missing routing information to node 1.
Execute the following as root on the node that's missing HAIP route:
# route add-net 169.254.0.0 netmask 255.255.0.0 dev eth2
Execute the following statement on node 2:
Start ora.crsd as root on the node that's partial up:
# $GRID_HOME/bin/crsctl start res ora.crsd-init
Grid execution path: PATH=$PATH:$HOME/bin:/u01/app/11.2.0/grid/bin
The CRS of Node 2 starts normally.
Execute the following statement on node 1:
The other workaround is to restart GI on the node that's missing HAIP route with "crsctl stop crs-f" and "crsctl start crs" command as root.
[root@shwmsdb2 bin] #. / crsctl stop crs-f
It's stuck.
End with Ctrl+C.
However, the CRS of Node 1 has been abnormal.
Use ps-ef | grep grid to see that node 1 has a grid stuck process and kill dropped the process.
Kill-9 31307
Only the normal grid process is left between the two nodes.
Shut down the crs service on both nodes.
Crsctl stop crs
Shut down normally.
Open the crs service of the two nodes.
Crsctl start crs
Execute when it is enabled:
Ps-ef | grep grid
Ps-ef | grep oracle
Crsctl stat res-t
It all looks normal.
Crs_stat-t on both sides is also normal.
[grid@shwmsdb2] $crs_stat-t
Name Type Target State Host
Ora.CRS.dg ora....up.type ONLINE ONLINE shwmsdb1
Ora.DATA.dg ora....up.type ONLINE ONLINE shwmsdb1
Ora.FRA.dg ora....up.type ONLINE ONLINE shwmsdb1
Ora....ER.lsnr ora....er.type ONLINE ONLINE shwmsdb1
Ora....N1.lsnr ora....er.type ONLINE ONLINE shwmsdb2
Ora.asm ora.asm.type ONLINE ONLINE shwmsdb1
Ora.cvu ora.cvu.type ONLINE ONLINE shwmsdb2
Ora....network ora....rk.type ONLINE ONLINE shwmsdb1
Ora.oc4j ora.oc4j.type ONLINE ONLINE shwmsdb2
Ora.ons ora.ons.type ONLINE ONLINE shwmsdb1
Ora....ry.acfs ora....fs.type ONLINE ONLINE shwmsdb1
Ora.scan1.vip ora....ip.type ONLINE ONLINE shwmsdb2
Ora.shwmsdb.db ora....se.type ONLINE ONLINE shwmsdb1
Ora....SM1.asm application ONLINE ONLINE shwmsdb1
Ora....B1.lsnr application ONLINE ONLINE shwmsdb1
Ora....db1.ons application ONLINE ONLINE shwmsdb1
Ora....db1.vip ora....t1.type ONLINE ONLINE shwmsdb1
Ora....SM2.asm application ONLINE ONLINE shwmsdb2
Ora....B2.lsnr application ONLINE ONLINE shwmsdb2
Ora....db2.ons application ONLINE ONLINE shwmsdb2
Ora....db2.vip ora....t1.type ONLINE ONLINE shwmsdb2
So far, the crs and asm of both nodes are normal.
4. Fault summary
IBM's x3850 x5 series PC Server has the defect that USB turns on the dhcp function, which leads to the defect that the usb network card may occupy HAIP. The RAC database environment running on such machines in the production environment needs to turn off the automatic dhcp acquisition function of USB0 and configure the USB0 with static IP.
Plan to delete USB0 on both nodes.
[root@shwmsdb1 ~] # / sbin/ifdown usb0
[root@shwmsdb1 ~] # cd / etc/sysconfig/network-scripts
[root@shwmsdb1 network-scripts] # cat ifcfg-usb0
# IBM RNDIS/CDC ETHER
DEVICE=usb0
BOOTPROTO=dhcp
ONBOOT=no
HWADDR=5e:f3:fd:35:86:33
[root@shwmsdb1 network-scripts] # mv ifcfg-usb0 ifcfg-usb0.bak
[root@shwmsdb1 network-scripts] # ls
Ifcfg-eth0 ifdown-bnep ifdown-isdn ifdown-sl ifup-eth ifup-ipx ifup-ppp ifup-wireless
Ifcfg-eth2 ifdown-eth ifdown-post ifdown-tunnel ifup-ib ifup-isdn ifup-routes init.ipv6-global
Ifcfg-lo ifdown-ippp ifdown-ppp ifup ifup-ippp ifup-plip ifup-sit net.hotplug
Ifcfg-usb0.bak ifdown-ipsec ifdown-routes ifup-aliases ifup-ipsec ifup-plusb ifup-sl network-functions
Ifdown ifdown-ipv6 ifdown-sit ifup-bnep ifup-ipv6 ifup-post ifup-tunnel network-functions-ipv6
[root@shwmsdb1 network-scripts] # ifconfig-a
Eth0 Link encap:Ethernet HWaddr 5C:F3:FC:DA:86:80
Inet addr:10.0.0.89 Bcast:10.0.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:65714 errors:0 dropped:0 overruns:0 frame:0
TX packets:15916 errors:0 dropped:0 overruns:0 carrier:0
Collisions:0 txqueuelen:1000
RX bytes:5327553 (5.0 MiB) TX bytes:1627321 (1.5 MiB)
Interrupt:169 Memory:92000000-92012800
Eth0:2 Link encap:Ethernet HWaddr 5C:F3:FC:DA:86:80
Inet addr:10.0.0.90 Bcast:10.0.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:169 Memory:92000000-92012800
Eth0:3 Link encap:Ethernet HWaddr 5C:F3:FC:DA:86:80
Inet addr:10.0.0.100 Bcast:10.0.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:169 Memory:92000000-92012800
Eth2 Link encap:Ethernet HWaddr 5C:F3:FC:DA:86:82
Inet addr:192.168.123.1 Bcast:192.168.123.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1536228 errors:0 dropped:0 overruns:0 frame:0
TX packets:1539186 errors:0 dropped:0 overruns:0 carrier:0
Collisions:0 txqueuelen:1000
RX bytes:729154172 (695.3 MiB) TX bytes:801250137 (764.1 MiB)
Interrupt:217 Memory:94000000-94012800
Eth2:1 Link encap:Ethernet HWaddr 5C:F3:FC:DA:86:82
Inet addr:169.254.66.26 Bcast:169.254.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:217 Memory:94000000-94012800
Lo Link encap:Local Loopback
Inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:529225 errors:0 dropped:0 overruns:0 frame:0
TX packets:529225 errors:0 dropped:0 overruns:0 carrier:0
Collisions:0 txqueuelen:0
RX bytes:137382526 (131.0 MiB) TX bytes:137382526 (131.0 MiB)
Usb0 Link encap:Ethernet HWaddr 5E:F3:FD:35:86:33
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
Collisions:0 txqueuelen:1000
RX bytes:0 (0.0b) TX bytes:0 (0.0b)
There will be no usb0 in ifconfig-an after restarting the server.
Problem solved.
The above is all the contents of the article "what to do if terminating the instance due to error481 causes ASM to fail to start up". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.