From "the end of the Mountain" to "the Willow and the Flowers"-- the Story of me and the data Center in the 44th issue of the Technical Life Series 07/04 Update SLTechnology News&Howtos

From "the end of the Mountain" to "the Willow and the Flowers"-- the Story of me and the data Center in the 44th issue of the Technical Life Series

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Write at the front

If your database performance often wobbles, sometimes fast and sometimes slow, how will you start and how will you solve it step by step? Is it to look at the waiting event first? Or watch AWR or ASH? Next, optimize SQL? Adjust the parameters? Or even replace the hardware? This time, I would like to share a case I dealt with at GCS. I hope you will get something new after reading it.

Here comes the problem.

In a quiet morning, I had planned to be able to read for a while and do a few experiments to pass the time. As soon as I picked up the book, the phone arrived as scheduled, turning the quiet morning into the beginning of work. Think about it carefully, this is the normal state of operation and maintenance people, and you will be relieved. Briefly chatted a few words, know some of the basic situation of the problem, and let the customer collect relevant information, began to analyze the problem.

Several rebuild index operations wait more than 10 hours!

Analysis of problems

Background introduction

Database version: 12.2.0.1 RAC

OS:Oracle Linux

Problem: rebuild index is slow

"here comes the routine."

There are a few things to confirm:

1. How big is the index? -confirm whether it is normal or abnormal to take too long. If the index is very large, then I think it is normal to wait for a long time. It has been confirmed that the index is not large, only a few G

two。 Where does the time go? -determine the next direction of investigation.

Maybe you have a way of thinking and start making moves:

First step: confirm the problem:

It appears that the re-indexing operation is initiated through a sqlplus connection.

According to the above information, a total of three index rebuild are recorded.

The second trick: where do you spend your time?

Now that you've confirmed that the session is indeed doing index reconstruction, go to ASH to see what the session is waiting for.

Based on the above output, it seems that the re-indexed statement is waiting for "gc cr multi block request" many times. This shows that this re-indexing operation needs to construct a consistent mirror of the data through the cr blocks saved in undo, and multiple cr blocks need to be accessed to complete the operation. For the specific principle, you can watch the video of the first open course of Yee College. If you haven't seen it, download it quickly.

Here is a detailed explanation of the wait event:

Principle explanation:

This wait event indicates that the applicant instance needs to request multiple cr blocks from the remote instance (possibly multiple) (the number of blocks depends on the setting of the parameter db_file_mulitblock_read_count). This wait event ends only after all requested blocks have been successfully returned, that is, if one of the blocks is not successfully accepted for some reason, all blocks need to be reapplied. This is why gc current/cr multi block request often occurs at the same time as the waiting event gc cr failure/ gc current retry.

For more explanations of cache fusion-related wait events, please refer to Chapter 12 of "detailed introduction to Oracle RAC Core Technologies."

Now I have the answer. Many people will feel that they have no way to start when they see gc current/cr multi block request. They seem to have run out of water. They have all the basic information here. Combined with the above principles, everyone will think about it for a few seconds. If you change it below, how can you continue to troubleshoot the problem? The reason for the problem is behind it. It's up to you to decide when to turn down.

Reason: re-indexing requires multiple blocks to be requested from another instance. The application efficiency is low and the setting of network parameters is unreasonable.

Brainstorming: three preliminary directions to be investigated.

1: there is a blocker process on the remote instance that holds blocks all the time, making it impossible for applicants to get the data.

2: too many data blocks are applied for, resulting in too much load on the private network, resulting in performance problems

3: there are some problems in the private network configuration of RAC, resulting in the data can not be quickly transferred to the remote node.

Continue to make moves.

The third move: check blocker.

Here is the information in AWR

Top 10 Foreground Events by TotalWait Time

The above information shows that global cache-related cr multi-block reads generate the most waits, with a total of 3030 waits, with a total waiting time of 1733.7 seconds, with an average waiting time of 572.19ms each time. So it seems that the waiting time is not very long, and it does not seem to be caused by one blocker process blocking other processes. If this is the case, what should happen is that the waits value of this wait event is low, but the Total Wait Time (sec) and Avg Wait are high. In line with the principle of everything being told by the facts, continue to investigate.

View bar chart information for waiting events:

Most of the waiting time is less than 512us.

Another part of waiting 1 is less than 32ms.

The longest waiting time is no more than 2s.

Note: 11g, 12c AWR report, will contain each waiting event waiting time corresponding to the bar chart information, it waiting events corresponding to the waiting time is divided into several groups, so that you can distinguish a waiting event caused by the waiting time is how to distribute, corresponding to understand the composition of waiting time is very helpful.

The above information shows that the main wait time distribution is less than 512us, while a small part of the wait is less than 2s, no more than 2s wait. Therefore, it is further confirmed that the problem is not caused by one Blocker process blocking other processes.

When you see this, you can basically rule out Direction 1 and continue to make moves:

Fourth trick: check the throughput of private network

According to the private network traffic information (EstdInterconnect traffic (KB)) in AWR, it seems that the private network traffic at that time is only about 1M/s, which is not very high. Compared with the normal situation, there is no significant change in private network traffic. When you see here, direction 2 can basically be ruled out.

So, there's only direction 3 left. Take a look at the netstat output of OSW to get the answer. The following is some of the intercepted OSW information.

The last big trick: make a final decision.

Ip layer statistics in netstat output:

The above information indicates that node 1 has always had the problem of packet reorganization failure in the IP layer. Looking back at the configuration of the database parameter db_file_multiblock_read_count, we found that this parameter was set to 126, and then looked at the output of ifconfig:

It seems that the cause of the problem is more obvious. The above information indicates that the mtu value of the private network card is less than 2k.

For a database whose block size is 8k and db_file_multiblock_read_count = 126 (which means that each database has multiple reads, for example, gc cr multiblock read needs to be cut into 8192 reads 128 pegs 2044 reads 513 at the IP layer, that is, more than 500 blocks), multiple reads will be fragmented, while the UDP protocol itself is the protocol of Unreliable, which not only means that IP packets are sent sequentially. Once the same application layer (for example, oracle's lms process) finds that one of the more than 500 packages is invalid, the whole package needs to be reorganized.

In addition, we also found that there has been a "gc cr block lost" waiting for this database instance in awr, which shows that there are indeed some problems in the transmission efficiency of the private network.

10 Foreground Events by Total WaitTime

Before giving a solution, answer a question: many people will ask where the three directions come from. Let me tell you in detail:

First: the wait event means that you want to apply for multiple data blocks from the remote instance, and it must be completed after all the applied blocks have been obtained.

Second: RAC's cachefusion is used as a data transfer protocol through UDP, and UDP itself is an unreliable data transfer protocol, that is to say, data transmission is unnumbered (unlike TCP protocol), and data integrity is verified by the application itself.

Third: the bottom layer of UDP protocol is to cut the data into IP packets for transmission.

To quote the detailed introduction of Oracle RAC Core Technologies, a packet goes through the following steps from one instance to another:

Step 1: the lms process of node 1 needs to send a data block (assuming db_block_size=8k,MTU=1500) to a process of node 2 and notify the OS (operating system) of the request

Step 2: the operating system of node 1 needs to cut this data block into about 6 data fragments and put it into the UDP send cache (SendBuffer) of the corresponding port of node 1.

Step 3: data fragments are sent to node 2 one after another through the network

Step 4: the operating system of node 2 receives the packet sent and saves it to the corresponding UDP receive cache (Recevie Buffer)

Step 5: node 2 begins to assemble data shards. When all six data shards are assembled successfully, OS sends the assembled packets to the corresponding receiving process.

Step 6: the receiving process of node 2 processes the received packets.

Use an analogical example: I go to the bank to deposit 1 million yuan (although I don't have that much money! ), before the bank declares its deposit success, it completes the following process:

1: give my 1 million to 10 people and 10 banknote counters and make sure that the amount of money I have saved is 1 million

2: put the 1 million in the safe of the bank

3: inform me of my success in saving money.

Throughout the process, the situation that may occur is:

+ + there is a 100 yuan note stuck in the counting machine, how can not take it out, then this counting machine is a Blocker, it blocks others, so that the process of counting money can not be completed.

+ + each banknote counter is very slow, counting only 10 banknotes per minute, which slows down the counting time as a whole.

+ + the staff who put the money in the safe can only transport 10,000 yuan to the safe at a time, that is, the transmission is too slow.

Now that we know the whole process and the process of analogy, it is easy to think of the possibility of problems that may arise in the middle. Then the three directions mentioned above naturally came to my mind.

Solution

Now that we know that the reason for the problem is that the ip layer needs to cut each multi-read udp packet into more than 500 small ip packets for transmission, resulting in the fragmentation and reassembly failure of the IP layer process, then there are the following solutions.

Method 1: adjust the MTU value of the private network card to at least 7000, so that the IP layer does not need to cut up the data too much, thus improving the transmission efficiency

Method 2: adjust the database parameter db_file_multiblock_read_count to reduce the size of each packet sent, so that the IP layer does not need to cut the data blocks too much, so as to improve the transmission efficiency

Method 3: increase the fragment reassembly buffer of the UDP, so that the IP layer has more time to reconstitute the fragments into a complete packet, and reduce the probability of retransmission of the whole packet.

Finally, the user chose to increase the following parameters

Net.ipv4.ipfrag_high_thresh=16777216

Net.ipv4.ipfrag_low_thresh= 15728640

And adjust the database parameter db_file_multiblock_read_count to 16, and after applying the change, the operation of rebuilding the index is completed in about 5 minutes.

Summary

1. For RAC database, the private network communication of cache fusion is realized through UDP, while for multi-block reads, it is necessary for all requested data blocks to reach the remote node before the request operation is completed. Even if only one packet is not transmitted, all data blocks need to be retransmitted.

two。 For multi-block read operations, the number of blocks per request is controlled by the parameter db_file_multiblock_read_count

3. For RAC systems, we should pay special attention to the setting of operating system kernel parameters related to UDP, not only the configuration of udp buffer size, but also the configuration of buffer size of fragment reassembly.

This article is reproduced in Zhongyi Antu.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.