In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
As the saying goes,"no blessing comes twice, but no disaster comes singly." There are 2 nodes Oracle 11.2.0.4 database in production, of which 2 nodes are down due to hardware failure and 1 node goes to HANG. Let's analyze this malfunction together.
At 4:30 a.m., the shift called at the same time and said that a set of production library node 2 was down. The colleagues in the computer room saw that the machine was starting up, which was probably caused by hardware reasons. Thinking that node 2 was down and there was still a node 1 running, it should not be a problem, so he continued to sleep, and another DBA colleague close to the company rushed to the scene to support. But it didn't take long for DBAs on site to feed back information: another node alive also had problems. ogg was deployed on node 2, which was down, and automatically switched to node 1, but the replication delay of ogg has been increasing, and it feels like it has not been applied.
Try to use sqlplus to enter the database results reported ORA-00020 exceeded the maximum number of processes, unable to log in to the database, unable to analyze the current status of the database.
Then analyze which application server connects to this database, is it caused by application problems.
Find the application on the ip with the largest number of connections, confirm with the relevant business personnel, and block the port connected to the database to reduce the external connection of the database. But after banning this IP, the number of other IP connections rose again. I began to wonder if it was because of database problems that caused the application to process slowly, which led to too many connections. Unable to log in to database or authenticate at this time.
Communicate with the business department whether it is possible to try to kill partial sessions so that DBAs can connect to the database background for some administrative operations and performance analysis. After getting a positive answer from colleagues in the business part, kill some conversations with LOCAL= NO. Log in to the database background with sysdba, execute the performance analysis statement, just check the waiting event of the session, check the second sql, sql execution stuck. The new window login database still reports ORA-00020. Here it is further determined that the ogg and application problems are caused by database performance problems.
The database is HANG live, how to analyze it?
Think of a hanganalyze that others have shared before, which can be used to analyze the reasons for HANG when HANG lives in the database, so find the command ORADEBUG hanganalyze 3. Analyze the trace file and see the hang chain as shown below
Looking down, SMON is waiting for parallel recovery cord to reply, waiting time has been 289 minutes, is the fault occurred to hanganalyze time, and he blocked 1465 sessions.
From the trace, you can see that the waiting events are parallel recover cord wait for reply and gc domain validation. I haven't seen this waiting event, so I queried MOS. There are not many documents about these two waiting events. I found an article.
I don't know if it triggered an Oracle BUG.
Due to time constraints, you can only choose to restart the database instance of node 1. After restarting, the database returns to normal.
Ask God to help analyze the cause afterwards, and see the trace information of SMON process.
Found parallel recovery in progress, checked SMON process monitoring in OSW, no performance issues found.
A large number of p00xx processes were observed, indicating that recovery was performed in parallel, and no problems were seen.
The Great God suggested using TFA to view the logs for details, but the results were shelved without time for analysis.
Summary failure is: node 2 down, node 1 to take over the data of node 2, the result node 1 also because of the takeover of HANG live.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.