In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
Phenomenon:
After the node is down, it cannot be restarted, so it needs to dial the heartbeat network card several times before it can start itself. It is preliminarily determined that due to the inexplicable failure of HAIP, a node cannot start CRS.
1 check the network
[grid@gmdb1 trace] $oifcfg iflist-p-n
Bond0 22.1.32.0 UNKNOWN 255.255.254.0
Bond1 1.255.255.0 UNKNOWN 255.255.255.0
Bond1 169.254.0.0 UNKNOWN 255.255.0.0
2 check CRS
[root@gmdb2 tmp] # crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
3 check that ASM and HAIP cannot be started:
[root@gmdb2 tmp] # crsctl stat res-t-init
NAME TARGET STATE SERVER STATE_DETAILS Cluster Resources
Ora.asm 1 ONLINE OFFLINE
Ora.cluster_interconnect.haip 1 ONLINE OFFLINE
4 check with mcaasttest.pl, there is no problem:
[grid@gmdb2 mcasttest] $perl mcasttest.pl-n gmdb2,gmdb1-I bond0,bond1
# Setup for node gmdb2 #
Checking node access' gmdb2'
Checking node login 'gmdb2'
Checking/Creating Directory / tmp/mcasttest for binary on node 'gmdb2'
Distributing mcast2 binary to node 'gmdb2'
# Setup for node gmdb1 #
Checking node access' gmdb1'
Checking node login 'gmdb1'
Checking/Creating Directory / tmp/mcasttest for binary on node 'gmdb1'
Distributing mcast2 binary to node 'gmdb1'
# testing Multicast on all nodes #
Test for Multicast address 230.0.1.0
| 16:42:02 on November 28 | Multicast Succeeded for bond0 using address 230.0.1.0 purl 42000 |
November 28 16:42:03 | Multicast Succeeded for bond1 using address 230.0.1.0 purl 42001
Test for Multicast address 224.0.0.251
November 28 16:42:04 | Multicast Succeeded for bond0 using address 224.0.0.251purl 42002
November 28 16:42:05 | Multicast Succeeded for bond1 using address 224.0.0.251pur42003
5 check CSSD.LOG
2017-11-28 11 begin on node 4815 02.797: [CSSD] [2139567872] clssnmLocalJoinEvent: begin on node (2), waittime 193000
2017-11-28 11 set curtime 48 02.797: [CSSD] [2139567872] clssnmLocalJoinEvent: set curtime (1040905644) for my node
2017-11-28 11 scanning 48 02.797: [CSSD] [2139567872] clssnmLocalJoinEvent: scanning 32 nodes
2017-11-28 11 Node gmdb1 48 Node gmdb1 02.797: [CSSD] [2139567872] clssnmLocalJoinEvent: Node gmdb1, number 1, is in an existing cluster with disk state 3
2017-11-28 11 48 02.797: [CSSD] [2139567872] clssnmLocalJoinEvent: takeover aborted due to cluster member node found on disk
2017-11-28 11 node 4815 02.808: [CSSD] [2358462208] clssnmvDHBValidateNcopy: node 1, gmdb1, has a disk HB, but no network HB, DHB has rcfg 405549564, wrtcnt, 39931581, LATS 1040905654, lastSeqNo 39931578, uniqueness 1510056501, timestamp 15118408821783220964
2017-11-28 11 after CmInfo Stateval 48 after CmInfo Stateval 03.287: [CSSD] [2144298752] clssgmWaitOnEventValue: after CmInfo Stateval 3, eval 1 waited 0
2017-11-28 11 node 48 03.782: [CSSD] [2363209472] clssnmvDHBValidateNcopy: node 1, gmdb1, has a disk HB, but no network HB, DHB has rcfg 405549564, wrtcnt, 39931583, LATS 1040906624
There are a large number of records of no network heartbeat in the log.
Check
SQL > select * from v$cluster_interconnects
NAME IPADDRESS IS SOURCE
Eth2:1 169.254.134.65 NO
It is found that the HAIP is running, but the local HAIP cannot be started, so that the CSSD cannot be started. Check the dependency of CSSD:
[root@12crac2] # crsctl stat res ora.cluster_interconnect.haip-init-f
NAME=ora.cluster_interconnect.haip
TYPE=ora.haip.type
STATE=OFFLINE
TARGET=ONLINE
ACL=owner:root:rw-,pgrp:oinstall:rw-,other::r--,user:grid:r-x
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
ACTIVE_PLACEMENT=0
AGENT_FILENAME=%CRS_HOME%/bin/orarootagent%CRS_EXE_SUFFIX%
AUTO_START=always
CARDINALITY=1
CARDINALITY_ID=0
CHECK_INTERVAL=30
CREATION_SEED=15
DEFAULT_TEMPLATE=
DEGREE=1
DESCRIPTION= "Resource type for a Highly Available network IP"
ENABLED=0
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
HOSTING_MEMBERS=
ID=ora.cluster_interconnect.haip
LOAD=1
LOGGING_LEVEL=1
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
PLACEMENT=balanced
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=5
SCRIPT_TIMEOUT=60
SERVER_POOLS=
START_DEPENDENCIES=hard (ora.gpnpd,ora.cssd) pullup (ora.cssd)
Temporary solution:
In the case of determining that the heartbeat network is unable
Disable HAIP:
Crsctl modify res ora.cluster_interconnect.haip-attr "ENABLED=0"-init
Crsctl modify res ora.asm-attr "START_DEPENDENCIES='hard (ora.cssd,ora.ctssd) pullup (ora.cssd,ora.ctssd) weak (ora.drivers.acfs)', STOP_DEPENDENCIES='hard (intermediate:ora.cssd)'"-init
After the modification is complete, check again:
Related articles: on MOS
Known Issues: Grid Infrastructure Redundant Interconnect and ora.cluster_interconnect.haip (document ID 1640865.1)
BUG about HAIP on MOS
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 306
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.