Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Systemstate Dump Analysis of Classical cases (part I)

2025-03-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

Preface

In this issue, we invite Lao K, another Oracle expert from Zhongyi Technology, to share with you the classic cases of systemstate dump analysis. After that, we will have more technical experts to bring more sincerity to share.

As a database engineer who has been working hard in the data center for a long time, Lao K feels eager to see Xiao y's early sharing. He also wants to share some interesting cases that I encounter every day. I hope we can all get some gains from it and avoid detours. At the same time, this article involves a lot of basic knowledge, but also involves the seemingly boring trace analysis, but Lao K still advises you to read this article patiently.

Wonderful preview

How to analyze cursor:pin S wait on X?

How to analyze library cache lock?

How to analyze and interpret systemstate dump?

How to deal with emergency response quickly and collect information?

reminder

Warm Tip: this is Lao K's sincere work, a little longer, if you have some discomfort in reading Wechat, you can move to the QQ group: 227189100 download text to read, and at the same time follow our WeChat account "Zhongyi Antu" to communicate with us.

Systemstate Dump, let's call it SSD.

Part 1

Here comes the problem.

One weekend morning, the customer called and one of the nodes in the two-node RAC database was rammed to death.

Phenomenon description:

> > Node hang is dead, SYS and ordinary users are unable to log in

> > the affected area is only on one of the nodes, and the other nodes can provide services normally.

> > hang dead nodes have a large number of exception waiting events cursor:pin S wait on X and a small amount of library cache lock.

Part 2

Fault handling and information collection

The first reaction of Lao K is to let customers quickly collect database hanganalyze and SSD, immediately restart the problem node database by killing process, and give priority to restore database service.

Finally, after collecting the information and sending it to Lao K, the customer restarted the problem node database and everything returned to normal.

Part 3

Knowledge point literacy

Cursor:pin S wait on X

When will this wait event occur?

When a session holds the mutex of a cursor (such as sql/procedure/function/package body, etc.) in X mode, if another session needs to request the mutex; of the cursor in S mode, generally speaking, when hard parsing of cursor, it holds the mutex of cursor in X mode, while soft parsing of cursor, it holds the mutex of cursor in S mode.

To take a simple example, a session (SESSION_A) is parsing (hard parsing) a sql statement (SQL_A), and when another session (SESSION_B) executes the sql statement (SQL_A) at the same time (the statement needs to be soft parsed before execution), SESSION_B waits for the cursor:pin S wait on X event.

How to locate the object they are waiting for?

The P1 parameter idn of the waiting event is actually the hash_value of the sql statement, that is, the actual waiting object can be located through the P1 parameter of the waiting event.

How to find the source of the incident?

After locating the object that the waiting event is waiting for, the owner of the object MUTEX is the source of the waiting event.

In the trace file, you can find the holder through the oper EXCL keyword.

Library cache lock

When will this wait event occur?

When a session modifies an object in library cache (mainly TABLE / INDEX/CLUSTER/PROCEDURE, etc.) (usually refers to the DDL operation), the library cache lock; of the object is held in X mode. When a session needs to use an object in parsing sql, it requests the library cache lock of the object in S mode.

To take a simple example, one session (SESSION_A) is DDL the table TAB_A, and another session (SESSION_B) starts executing sql statements related to TAB_A, and the SESSION_B waits for the library cache lock event at this time.

How to locate the object they are waiting for?

The P1 of the waiting event is handle address, that is, the address of the handle of the waiting object in library cache, which can be uniquely marked with the memory object.

How to produce the source of the incident?

After locating the object that the waiting event is waiting for, the session that holds the X-mode library cache lock of the object is the source of the waiting event.

In the trace file, the source of the waiting event can be found by the address keyword of the object and the mode=X keyword.

So what happened in the abnormal time of the database? let's start with trace and restore the scene.

Part 4

Malfunction analysis

Environment introduction:

Operating system AIX 5.3

Database ORACLE 10.2.0.5 two-node RAC

4.1 first brainstorming

Existing "intelligence"

> > one node rammer in RAC system

> > there are a large number of cursor:pin S wait on X wait events and a few library cache lock wait events in the database

> > there are collected hanganalyze information and SSD trace files

What is the principle of these two waiting events?

What is the performance of the system after the above wait event?

When a large number of cursor:pin S wait on X wait events occur in a system, the reason is usually due to the hard parsing exception of a session sql, which blocks the soft parsing of the SQL. In this case, there is only one possible source. As long as the source is found, the problem is solved.

4.2 start with analysis

4.2.1

SSD starts with analysis

General handling method: for cursor:pin S wait on X waiting events, you only need to pay attention to the waiting object, whether it is the same object or multiple different objects. If you are all waiting on the same object, the situation is relatively simple. To find the waiting object, you need to find the keyword 'waiting for' cursor:pin S wait on X', in SSD's trace. The result is shown below:

To Lao K's surprise, these sessions waiting for "cursor:pin S wait on X" are not all on the same idn, but are distributed on several different idn.

It seems that the problem is more complex than Lao K first thought, but it doesn't matter. According to Lao K's experience in dealing with various problems, complex problems are just a collection of simple problems, and all it takes is a little more patience.

Next, what Lao K does is to find the owner of these cursor object mutex, take 82d61778 as an example, select one of the sid:598 waiting for this object to analyze, as shown in the following figure

This needs to be explained.

Idn 82d61778: indicates the cursor object

Oper GET_SHRD: indicates that the session is requesting the mutex in shared mode

(858,0): indicates that the holder sid of the mutex is 858

Having found the holder, let's take a look at what the session with a sid of 858 is doing:

The above figure shows that 858 sessions are also waiting for "cursor:pin S wait on X", and from the history of waiting, 858 sessions have been waiting for a very long time; in the same way, let's see who holds the mutex of 858 session requests:

We see that session 859 holds bbcee4f7's mutex, and then it waits for the "library cache lock" event.

This is the end of the problem, let's stop and think about it.

4.3.2

The second brainstorming

> > session 598 is waiting for cursor:pin S wait on X. The blocking sid is 858.

> > session 858 is waiting for cursor:pin S wait on X event, blocker sid is 859

> > session 859 is waiting for the library cache lock event. The blocker is waiting to be checked.

Who is the blocker of library cache lock? it is likely to be one of the culprits.

> > is it true that a large number of cursor:pin S wait on X are blocked by library cache lock? if so, the problem becomes simple.

If you are still dizzy when you see this, then Lao K suggests that the reader might as well take out a pen and draw a picture, which we will call a waiting chain diagram for the time being:

4.3.3

Continue to analyze SSD

Instead of checking the "prime suspect" for the time being, we continue to sort out the waiting event relationship. With the same analysis method as above, we find the owner information of each IDN object in trace, as follows:

We can see that sessions such as 859 mutex 858 mutex currently hold mutex and block other sessions from obtaining their held mutex; in shared mode, where 859 session is the source of the waiting chain we just found, and 858 session is the middle part of the waiting chain we just found, as the holder of an mutex while waiting for another mutex, let's see what other conversations are waiting for:

Lao K does not list the information of all the above conversations here. After confirmation, all conversations are waiting for the "cursor:pin S wait on X" waiting event. At this time, let's update our waiting chain diagram:

Is it already clear when it comes to the analysis? A large number of cursor:pin S wait on X managers have made it clear that all the blame is directed at sid 859.

We are only one step away from the truth. We only need to analyze the source of library cache lock to explain the whole mystery. Lao K has already mentioned the method of analyzing library cache lock waiting events.

The next step is to locate the blocking relationship of library cache lock with the trace file, and you can quickly locate the cause.

Due to the limited space, the sharing of this issue ends here, and the next sharing continues to see how Lao K goes from shallow to deep step by step to analyze the problem and see what the high-end SSD analysis looks like. Please follow the next issue (to be continued.)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report