Optimization of waiting event LOG FILE SYNC (awr) for oracle 07/01 Update SLTechnology News&Howtos

Optimization of waiting event LOG FILE SYNC (awr) for oracle

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Log file sycn is one of the most common waiting events in ORACLE. In general, the waiting time of log file sycn is very short 1-5ms, so there will be no problem, but when something goes wrong, it is often difficult to solve. When will log file sync wait occur?

The common ones are as follows:

1) commit operation

2) rollback operation

3) DDL operation (commit will be performed before DDL operation is implemented)

4) commit generated by data dictionary modification caused by DDL operation

5) some operations that can recursively modify the data dictionary, such as querying the next value of SEQ, may result in the modification of the data dictionary. A typical situation is that the cache property of SEQ is set to nocache, which causes the data dictionary to be modified every time SEQ is called, resulting in a recursive commit.

In a normal system, the vast majority of log file sycn waits should be caused by 1) log file sycn waits caused by commit operations, and some abnormal systems, such as frequent rollback and systems with seq's cache set to nocache, may also cause more log file sycn waits.

If we can know which links are included in log file sync, and then optimize each link pertinently, we can get twice the result with half the effort.

The above is the delay diagram of the log file sync waiting event drawn by Tanel Ponder, which makes a point on some key links. I would like to make a brief translation of the links as follows:

1) user process initiates commit

2) user process notifies lgwr to write log

3) lgwr receives the request to start writing

4) lgwr write completed

5) lgwr notifies the user that the process is finished

6) the user process is notified to continue to do other things

1the time of the second stage is mainly the time of post/wait. Typically, this kind of post/wait is implemented by using the semaphore (IPC) of the operating system. If the system has sufficient CPU resources, there is generally no big delay. After the foreground process post lgwr, it starts waiting for log file sync.

(2) the time of the three phases is mainly the time that lgwr waits for cpu scheduling in order to obtain cpu resources. If the system has sufficient cpu resources, there is generally no big delay. Here we need to know that lgwr is only a process of the operating system, it needs the scheduling of the operating system to obtain cpu resources before it can work.

3Jing 4 stage time, mainly the real physical io time, lgwr tells os to write the contents of log buffer to disk, and then lgwr goes to sleep (waiting for log file parallel write), which normally accounts for most of the time of the whole log file sync. It is also important to point out that lgwr may not be able to tell os to write to disk immediately after obtaining cpu resources, and real IO operations can only begin after ensuring that all redo copy latch has been released.

4Jing 5 phases, os dispatches lgwr to regain cpu resources, and the lgwr post foreground process is written. Lgwr may post many foreground processes (side effects of group commit)

5Jing 6 phase time, the foreground process receives the notification of lgwr, returns to the cpu run queue, and deals with other things (log file sync ends).

/ * what is group commit***/

More than once I have seen some suggestions for log file sync tuning that read: turn on ORACLE's group submission feature.

Group commit is on by default, and you don't have any means to turn it off!

I have always thought that the name group commit is not too appropriate, it should be more appropriate to refresh the group, only to represent personal opinions.

What is a group submission?

The image above is an abstract diagram of log buffer, and log buffer is very busy at this time.

Set up a scene like this for everyone.

C1 as a commit record has been copy into the log buffer, and then the foreground process tells lgwr to write the log. According to my previous description, before the foreground process post lgwr writes, before the lgwr really starts to write, there is probably a time difference. In this time difference, c2Query g1Mague c3 has also copied the corresponding log to the log buffer, in which c1Personc c3 is the record of commit, and G1 is only an ordinary transaction log. It's not a commit log. Before lgwr actually starts to write, it checks the highest point of the current log buffer and finds that at position c3, it takes this point as the goal of this refresh, flushing all the logs of c1, c2, g1, and c3 to disk. Although the operation of refreshing the log is started by C1, but c2 LGWR g1Powerc3 is also a hitchhiker for the beneficiaries, and the log is also refreshed to the log file. This function is called group commit. For some people who are not familiar with ORACLE, it is easy to misunderstand the group commit as packaging the committed things to the log. In fact, LGWR only refreshes according to the highest point allocated by log buffer, regardless of whether your transaction log is committed or not. So I think it's better to call a group refresh.

In the picture, the log of C1 and c2 has been copied. I use filled to show that the log space of c3 has been allocated, but the copy has not been completed. I use allo to show that in this case, lgwr needs to wait for the completion of c3 log copy before it can really start the refresh operation.

/ * what is group commit***/

After analyzing the various stages of log file sycn, we can see that under normal circumstances, the slowest link should be in stage 3re4 (as shown in the figure above), a typical io operation, where the corresponding database wait is called log file parallel write. Other phases such as scheduling delay and IPC time are generally very short. On the Internet, forums, including many books, many of them tend to shift the blame to the slow IO after there are problems with IO. In most cases, this inference may be correct, but this is not always the case, and as we have analyzed, the speed of log file sync also depends on whether cpu resources are abundant and whether the system is overloaded. Let's take a look at the next diagram, which describes how each stage accounts for the proportion of the entire CPU in the case of insufficient log file sycn resources.

As you can see, due to the lack of CPU resources and the overload of the system, there has been a big delay in operating system scheduling. The delay in the IO part of Phase 3 and 4 is no longer the biggest culprit of the whole log file sync time!

/ * episode * /

Which foreground processes will lgwr post?

When lgwr refreshes the log, it will post the corresponding foreground process (wakeup) to continue to work, so how can lgwr determine which foreground processes should be wakeup?

The meaning of the p1 parameter waiting for log file sync is: P1 = buffer# in log buffer that needs to be flushed

When lgwr refreshes the buffer, it scans the list of active sessions to see which sessions are waiting for log file sync, and the buffer# of the session is less than or equal to the maximum value of the log buffer it refreshes, and these sessions will be wakeup.

/ * episode * /

LOG FILE SYNC tuning

As a general method of log file sync diagnosis and tuning, we can generally judge what is wrong by diagnosing the IO delay of the system and whether the cpu resources are sufficient.

Diagnosis and tuning of IO delay: you can use log file parallel write as a background process to wait for events to assist judgment. If the waiting time is too long, there is probably something wrong with the IO of your system. The means to optimize IO are generally: RAID way is not RAID5, preferably RAID10, close raid card read CACHE, all used as write CACHE, you can use 2-4 disks as the log disk group, if you are using storage, the general storage head CACHE is also relatively large, IO can be basically guaranteed. Using ssd as a log group to improve IO doesn't work well. The v$event_histogram view can be used to obtain the time distribution map of log file sycn and log file parallel write waiting events (described later) to assist in diagnosis.

Diagnosis and tuning of cpu resources: if the time difference between log file sync and log file parallel write is too large, the CPU resources of the system may be insufficient. Under solaris, you can also use the operating system tool prstat to diagnose the delay time of lgwr processes, as shown in the LAT column of the following figure. Other platforms cannot track and diagnose the process directly, but can assist the diagnosis through the system LOAD,CPU utilization. For example, if the CPU utilization exceeds 60%, it may cause scheduling delay to a certain extent, or the risk of scheduling delay if the CPU running queue exceeds the number of CORE of the physical CPU, and so on. If there is a bottleneck in the CPU resources of the system, it is tricky, because it is not difficult to replace the hard disk, but it is generally difficult to change the CPU, and the possible end result is to change the host. However, there are some ways to try: raising the priority of LGWR, either through the database parameter _ high_priority_processes, or the operating system command renice command (the former may be better). The purpose of adjusting LGWR priority is to make LGWR get CPU resources as easily as possible and reduce the queuing scheduling time.

Tuning applications: but sometimes the more effective means may not be desperately tuning the database and tuning the hardware, such as whether it is possible to merge things, which reduces the number of LOG FILE SYNC and improves the efficiency of system transactions in disguise.

Database tuning: improve the performance of log writes by setting the REDO LOG block size of ORACLE to 4K. 11GR2 version can specify the block size of REDO LOG. A modification under my version 11.2.0.3 will report an error saying that the modified value does not match the actual sector size. You can force the change to succeed by changing the implicit parameter _ disk_sector_size_override to true. The way to modify it is in alter database add log file xxxx blocksize 4096. If you take the PL/SQL stress test and submit it by commit write immediate wait, the gap before and after optimization is nearly 4 times, which is very amazing. But take our business stress test, only improved 1500 + TPS, is also very good.

Memory tuning: under AIX, if memory read-ahead is enabled, it is also very obvious to improve TPS dscrctl-n-b-s 1. See http://space.itpub.net/22034023/viewspace-751590.

/ poor LGWR** * * /

If there are a large number of processes waiting for LOG FILE SYCN, once the LGWR is written, it will wake up the POST processes and make them re-enter the CPU run queue, while LGWR will be push off out of the cpu run queue by the process that post it writes. In this case, because LGWR has been working for some time (just finished logging), and the foreground process has been waiting for some time (waiting for LOG FILE SYNC), according to the default scheduling policy of the operating system, the foreground process will have higher priority to obtain CPU resources, while LGWR may not get CPU resources because of the sudden shortage of CPU resources. This results in a significant reduction in the number of transactions in the system: because LGWR is no longer working (though for a short time). So the method I suggested earlier to increase the priority of the LGWR process is worth trying.

/ poor LGWR** * * /

Get the time distribution of log file sync and log file parallel write

If we just look at the AWR report and get the average waiting time of log file sync and log file parallel write for a certain period of time, it is sometimes not enough. We may want to know in more detail how many times of the 10000 waits are within 1ms, how many times are within 2ms, and so on. Querying V$EVENT_HISTOGRAM can tell us this information, which is very helpful for us to diagnose performance problems.

SQL > select event, wait_time_milli,wait_count

2 from v$event_histogram

3 where event = 'log file parallel write'

EVENT WAIT_TIME_MILLI WAIT_COUNT

Log file parallel write 1 22677

Log file parallel write 2 424

Log file parallel write 4 141

Log file parallel write 8 340

Log file parallel write 16 1401

Log file parallel write 32 812

Log file parallel write 64 391

Log file parallel write 128 21

Log file parallel write 256 6

As mentioned above, we can know that there are 22677 log file parallel write waiting times within 1ms, 424 waiting times within 2ms, and so on.

We can simply take two snapshots of V$EVENT_HISTOGRAM to determine the number of changes in the index during the interval to help us diagnose the problem. (AWR thought)

Does LOG BUFFER need tuning?

In general, there is no need for tuning!

After 10 gigabytes, the LOG BUFFER is generally large, usually 1 to more granules sizes. Unless you see more log buffer space waiting events, you do not need to resize the log buffer.

After 9.2, LOG BUFFER has been split into multiple LOG BUFFER according to the number of CPU in your system, which greatly alleviates redo allocatoin latch contention. Unless you see obvious redo allocation latch contention, you don't have to adjust the number of log buffer.

After 10G, the emergence of private redo and imu further reduced the contention of redo allocation latch. Each private redo is protected by a private redo allocation latch. As above, there is no need to adjust in general.

Redo-related latch needs to be tuned?

Redo copy latch: used only to track whether a process is copying data into the log buffer. Before lgwr can actually start writing, it must wait for the relevant process to finish copying, during which time lgwr will wait for LGWR wait for redo copy to wait. The number of processes that can be copied to log buffer at the same time is determined by _ log_simultaneous_copies.

Redo allocation latch: the protection process allocates space in the redo buffer to ensure that the space allocated by each process does not overlap.

Redo writing latch: this latch actually protects a flag bit. After the process acquires the latch, it modifies the flag bit, such as changing 0 to 1, which means that the lgwr is writing. In this way, the subsequent submission process will read the flag bit after obtaining the latch, and you will know whether the current LGWR is being written or not, avoiding a lot of unnecessary repeated notifications.

We know what these latch do, so do we need to tune them?

It is generally not necessary, unless their related wait has been brought to your attention, and various versions of ORACLE have been optimizing the acquisition and release of related latch, such as redo allocation latch, which has been done very efficiently. For the history of redo allocation latch, check out my other article: http://www.itpub.net/thread-1803010-1-1.html

New tuning methods

During 10GR1, ORACLE quietly introduced a parameter: commit_logging, which can be combined in four ways:

Commit write [batch | immediate] [wait | nowait]

When the 10GR2 version was released, this parameter was split into two parameters, commit_logging,commit_write. I personally think that the parameters after 10GR2 split can more accurately express the intention of the parameters.

Let's first focus on the parameter commit_write. Its parameter value can be wait/nowait, which means: when the foreground process transaction is committed, the LGWR is not notified to refresh the log. Wait is a notification, and the foreground process will wait for log file sync. Nowait does not notify, just wait for other operations to trigger lgwr to write logs (such as 3 seconds, 1m size, 1amp and 3full). If your business does not require high data consistency and does not require ACID D, in order to improve the number of things and improve performance, you can choose commit_write as the nowait mode. Before 10G, the D of ACID must be satisfied, that is, when the foreground process submits, it must wait for LOG FILE SYNC and wait for LGWR to refresh the log to disk.

Let's simply take a look at the commit_logging parameter. The optional value of the parameter is batch/immediate, which is extremely misleading and makes people think that batch means to control things to be packaged and submitted in the way of group commit. Immediate means to let things be submitted one by one, and to refresh the log buffer one at a time, but this is not the case!

Compared to batch, immediate's change vector (the transaction slot that modifies the rollback segment header) will be generated as a separate redo record, in the same way that 9i's commit logs. Commit logs are recorded in batch mode by merging into the transaction's redo record. This batch mode relies on the use of private redo and imu, and if they are turned off, the batch setting will have no effect.

Let's dump log file insert into a values (1111) / commit; to explain the difference between batch/immediate methods:

DUMP LOG FILE: enable private redo and imu. Logs with commit_logging set to immediate,commit are generated as separate redo record. A total of 2 redo record are generated. The second redo record is generated by commit. See the red section (OP:5.4, which represents the modification of the UNDO segment header).

REDO RECORD-Thread:1 RBA: 0x00044d.00000002.0010 LEN: 0x0230 VLD: 0x05

SCN: 0x0000.041b921c SUBSCN: 1 06/25/2013 11:27:32

(LWN RBA: 0x00044d.00000002.0010 LEN: 0002 NST: 0001 SCN: 0x0000.041b921c)

CHANGE # 1 TYP:0 CLS:51 AFN:3 DBA:0x00c04c80 OBJ:4294967295 SCN:0x0000.041b91d1 SEQ:1 OP:5.2 ENC:0 RBL:0

Ktudh redo: slt: 0x0016 sqn: 0x00002bee flg: 0x0012 siz: 136 fbi: 0

Uba: 0x00d1a78d.0068.2c pxid: 0x0000.000.00000000

CHANGE # 2 TYP:2 CLS:1 AFN:9 DBA:0x024002c5 OBJ:15750 SCN:0x0000.041b916a SEQ:1 OP:11.2 ENC:0 RBL:0

Omit

REDO RECORD-Thread:1 RBA: 0x00044d.00000004.0010 LEN: 0x00d0 VLD: 0x05

SCN: 0x0000.041b921e SUBSCN: 1 06/25/2013 11:27:34

(LWN RBA: 0x00044d.00000004.0010 LEN: 0001 NST: 0001 SCN: 0x0000.041b921d)

CHANGE # 1 TYP:0 CLS:51 AFN:3 DBA:0x00c04c80 OBJ:4294967295 SCN:0x0000.041b921c SEQ:1 OP:5.4 ENC:0 RBL:0

Ktucm redo: slt: 0x0016 sqn: 0x00002bee srt: 0 sta: 9 flg: 0x2 ktucf redo: uba: 0x00d1a78d.0068.2c ext: 104 spc: 2050 fbi: 0

DUMP LOG FILE: enable private redo and imu, set commit_logging to batch,commit and merge it into the redo record of things as a change vector, generated for commit as a redo record,change # 3.

REDO RECORD-Thread:1 RBA: 0x00044d.00000002.0010 LEN: 0x0230 VLD: 0x05

SCN: 0x0000.041b921c SUBSCN: 1 06/25/2013 11:27:32

(LWN RBA: 0x00044d.00000002.0010 LEN: 0002 NST: 0001 SCN: 0x0000.041b921c)

CHANGE # 1 TYP:0 CLS:51 AFN:3 DBA:0x00c04c80 OBJ:4294967295 SCN:0x0000.041b91d1 SEQ:1 OP:5.2 ENC:0 RBL:0

Ktudh redo: slt: 0x0016 sqn: 0x00002bee flg: 0x0012 siz: 136 fbi: 0

Uba: 0x00d1a78d.0068.2c pxid: 0x0000.000.00000000

CHANGE # 2 TYP:2 CLS:1 AFN:9 DBA:0x024002c5 OBJ:15750 SCN:0x0000.041b916a SEQ:1 OP:11.2 ENC:0 RBL:0

CHANGE # 3 TYP:0 CLS:51 AFN:3 DBA:0x00c04c80 OBJ:4294967295 SCN:0x0000.041b921c SEQ:1 OP:5.4 ENC:0 RBL:0

Ktucm redo: slt: 0x0016 sqn: 0x00002bee srt: 0 sta: 9 flg: 0x2 ktucf redo: uba: 0x00d1a78d.0068.2c ext: 104 spc: 2050 fbi: 0

I feel that the commit_logging parameter has little effect, which may help to reduce the abnormal time of ACID, and the size of log volume is slightly reduced in batch mode. For ACID exceptions, see my other article: http://www.itpub.net/thread-1803010-1-1.html

PL/SQL commit optimization

Traditionally, when a user issues a commit, the user session waits for the log file sync until the lgwr is written. After the LGWR is written, the foreground process is notified to continue with the rest of the operation. This mechanism ensures the persistence of the transaction and satisfies the D of the transaction ACID. But that's not how PL/SQL works: commit operations in PL/SQL don't wait for lgwr to finish writing before continuing with the rest of the operation. Take a simple example:

Begin

For r in (select id from T1 where mod (id, 20) = 0) loop

Update T1 set small_no = small_no + .1 where id = r.id

Commit

End loop

End

View the statistics related to session and sys as follows:

User commits (session statistic) 25

Messages sent (session statistic) 6

Redo synch writes (session statistic) 1

Log file sync (session events) 1

Messages received (lgwr statistic) 6

Redo writes (lgwr statistic) 6

Log file parallel write (lgwr events) 6

Lgwr only wrote six times (log file parallel write), and the user session waited for log file sync only once. That means that after the session issues the commit command, it does not stop to wait for the lgwr write to complete, and then continues to work on the rest of the transaction. The user session does not follow the persistence principle of the transaction! If the instance is crash during PL/SQL processing, some committed transactions are unrecoverable. Oracle has a plausible explanation for this: you don't know how many times you have submitted before the PL/SQL is processed, and Oracle will not make them recoverable, only when the PL/SQL ends, increase the number of redo sync writes and the foreground process enter the log file sync to wait. During PL/SQL processing, you keep checking the waiting events, and you can't see any log file sync waiting in the background. In addition, the statistics show that the session sent a total of 6 message sent requests to lgwr (request to log), and lgwr also received 6 message recived messages and wrote 6 times (log file parallel write). You might ask, how often does the session send a write request? Every time the foreground process sends a write request to LGWR, it will hold the redo writing latch, and then check whether lgwr is already processing the write request. If lgwr is already writing, the foreground process will not send a request to LGWR, nor will it wait for log fil sync, and will directly continue to complete the subsequent operation. If the lgwr is not writing, the foreground process will inform lgwr to write, but will not wait for log file sycn, or continue to complete the subsequent operation. It is only at the end of the PL/SQL that you finally wait for a log file sync. So if your disk writes fast enough, the more times lgwr will be post, the more proportional it will be. And if your cpu is strong enough, then the PL/SQL block loop time is small enough, then the number of times lgwr is post is also less, inversely proportional (in the case of a certain disk write rate).

It is worth noting that if DBLINK is included in the PL/SQL, then the traditional submission method will be used and no such "exception" will be generated.

Finally, a word of warning: although PL/SQL will wait for lgwr to finish writing and generate log file sync wait only at the end, do not think that when the instance crash is dropped during the running of PL/SQL, all the transactions handled by PL/SQL will be lost. This is not the case, only the missing part. During the running of pl/sql, the session always checks whether lgwr is working, if it is not working. Send the write request to the message sent of lgwr. After the lgwr receives the write request, it will write a log. As long as the transaction is written into the log file, it is recoverable. In other words, although the foreground is not waiting for log file sync, the background has been busy processing your transaction log.

The means of discovering the problem

Method one: discover from awr and judge whether there are problems in log file sync and log file parallel write of the system according to avg wait (ms).

Way 2: use the moats tool to diagnose which top wait of the current database, whether there is log file sync, log file parallel write, tool download address: http://www.oracle-developer.net/utilities.php.

Method 3: through the snap_events script of lewis, the waiting times and average waiting time of system-level waiting events are obtained.

Rem execute snap_events.start_snap

Rem execute snap_events.end_snap

Method 4: sample the wait event distribution of the lgwr background process and the statistical information related to the lgwr process through the snap script of tanel poder.

A summary from Master Bai (Eel) on the optimization of log file sync waiting events for puber to learn:

1. The average waiting time for events in log file sync exceeds that of 7ms. If the waiting time is too long, it means that log write takes too long to write each time. If you can optimize the storage of redo log files so that they are stored on faster disks, you can reduce the single wait time for this waiting event. (RAID 5mura-> RAID 10)

What should we do if we can't solve the problem by optimizing the iUnip O performance of redo logs, or if it still doesn't meet our expectations after optimizing the iUnip O performance of redo logs?

Second, experienced DBA may recommend increasing the log buffer (log buffer). When it comes to increasing the log buffer, some friends may wonder how the long write waiting time for redo log files is directly related to the log cache flushing area. In fact, this problem is not difficult to explain at all. If there is a problem with the performance of data files and the average waiting time for single-block reads is too long, then increase the db cache to reduce the total number of iUnites, so as to achieve the effect of optimizing iUniver. The principle of enlarging the log cache is the same, which makes it possible to

More redo log data is stored in the log cache, which reduces the number of lgwr writes due to insufficient redo log cache, and increases the average number of redo bytes written to the redo log file, thus reducing the number of redo's redo O times, thus achieving the purpose of optimizing log file sync waiting events.

Third, if both of the above methods fail, there is another way: to reduce the number of submissions. If you submit too frequently, no matter how much you optimize, you can't solve the problem completely.

By increasing the number of records submitted at one time and reducing the number of submitted batches, the waiting time of log file sync can be effectively reduced. Adopting this method means that large adjustments need to be made, and even changes to the application architecture will be made, which will be very costly.

4. Another solution that can optimize log file sync events is to set some frequently committed transactions to commit asynchronously.

Asynchronous commit is a new feature introduced in 10g version, which can be controlled by setting the commit_write parameter.

The default value of commit_write parameter is "immediate,wait"

Can be set to "immediate,nowait" to implement asynchronous submission.

Systems that use asynchronous commit need to do some additional checksum processing to clean up inconsistent data and reinsert data that has just been lost due to asynchronous submission. This requires some special processing at the application level, including the establishment of verification mechanism and error data processing mechanism. We need to make some special settings at the application level. It should be noted that data that are particularly important and cannot be fully replenished later is not suitable for this method.

The log file sync waiting event is very critical. We should establish a baseline for this indicator in the daily maintenance of the database. If there is any abnormal change in this indicator, we must analyze and solve the problem as soon as possible. Once this indicator deteriorates, it may lead to a sharp decline in system performance, or even lead to a temporary suspension. Last year, for a customer's system, the usual log file sync index was 2-3ms. During an inspection, Lao Bai found that the index had increased to 7ms. At that time, the inspection report advised customers to pay attention to this indicator and check the storage system and operating system as soon as possible to find out the cause of the slow down. The customer checked the storage and found no fault, so it came to nothing. During the inspection next month, it was found that the index had increased to 13ms, and once again, no problem was found. The index continued to deteriorate for the next two months, rising to more than 20 milliseconds. As no problems were found in the inspection work in the previous few months, and the current system is still very normal, so the customer did not check it carefully. Finally, one day, the system suddenly hung up and returned to normal 5 minutes later. Later check the reason, that is, caused by log file sync waiting. According to my suggestion, the customer checked from beginning to end and finally found that a link memory flash break in LVM. After repairing the link, everything returned to normal.

Through the above case, we should learn a lesson: if the log file sync index deteriorates, we must investigate the root cause of the problem as soon as possible, and if the waiting time of log file sync continues to rise, then the possibility of system hang is also increasing. It is imperative to find out the cause of the problem as soon as possible.

A summary of the optimization of log file sync waiting events from Master Gai (eygle), for puber's reference:

Http://www.eygle.com/statspack/statspack14-LogFileSync.htm

When a user submits (commits) or rollback (rollback), the redo information of session needs to be written to redo logfile.

The user process will notify LGWR to perform the write operation, and LGWR will notify the user process after completing the task.

This wait event means that the user process waits for the write completion notification of the LGWR.

For rollback operations, this event records the time from the rollback command issued by the user to the rollback completion.

If you wait too much, it may indicate that LGWR is inefficient in writing, or that the system commits too frequently.

To solve this problem, you can pay attention to:

Log file parallel write wait event

Statistics such as user commits,user rollback can be used to observe the number of submissions or rollbacks

Solution:

1. Improve LGWR performanc

Try to use fast disks. Do not store redo log file on raid 5 disks.

two。 Use batch submission

3. Appropriate use of options such as NOLOGGING/UNRECOVERABLE

The average redo write size can be calculated by the following formula:

Avg.redo write size = (Redo block written/redo writes) * 512 bytes

If the system produces a lot of redo and writes less each time, it generally means that LGWR is activated too frequently.

May lead to too many redo-related latch competition, and Oracle may not be able to effectively use the functions of piggyback.

Let's extract some data from an statspack to study this problem.

We see that here log file sync and db file parallel write wait to appear at the same time.

Obviously log file sync is waiting for the completion of db file parallel write.

There must be a bottleneck in the disk IO. The actual user's redo and data files are stored on the Raid disk at the same time, which has a performance problem.

It needs adjustment.

Due to frequent commit and over-frequent activation of LGWR, we see redo writing latch competition here.

You can find a detailed introduction to redo writing competition at steve's site:

Http://www.ixora.com.au/notes/lgwr_latching.htm

Oracle Internals Notes

LGWR Latching

When LGWR wakes up, it first takes the redo writing latch to update the SGA variable that shows whether it is active. This prevents other Oracle processes from posting LGWR needlessly. LGWR then takes the redo allocation latch to determine how much redo might be available to write (subject to the release of the redo copy latches). If none, it takes the redo writing latch again to recordthat it is no longer active, before starting another rdbms ipc message wait.

If there is redo to write, LGWR then inspects the latch recovery areas for the redo copy latches (without taking the latches) to determine whether there are any incomplete copies into the log buffer. For incomplete copies above the sync RBA, LGWR just defers the writing of that block and subsequent log buffer blocks. For incomplete copies below the sync RBA, LGWR sleeps on a LGWR wait for redo copy wait event, and is posted when the required copy latches have been released. The time taken by LGWR to take the redo writing and redo allocation latches and to wait for the redo copy latches is accumulated in the redo writer latching time statistic.

(Prior to release 8i, foreground processes held the redo copy latches more briefly because they did not retain them for the application of the change vectors. Therefore, LGWR would instead attempt to assure itself that there were no ongoing copies into the log buffer by taking all the redo copy latches.)

After each redo write has completed, LGWR takes the redo allocation latch again in order to update the SGA variable containing the base disk block for the log buffer. This effectively frees the log buffer blocks that have just been written, so that they may be reused.

From Master vage's summary of the optimization of log file sync waiting events, for puber's reference:

1. Log File Sync is the time from the start of the submission to the end of the submission. Log File Parallel Write is the time when LGWR starts writing Redo File to the end of Redo File. With this clear, you can see that Log file sync contains log file parallel write. Therefore, as soon as the log file sync waiting time is out, you must first watch the log file parallel write. If the average log file sync wait time (also known as commit response time) is 20ms, the Redo file file parallel write is 19ms, then the problem is obvious: the Redo file response O is slow, slowing down the commit process.

2. The time of Log File Sync is not just log file parallel write. The server process starts to submit, and when the notification LGWR has finished writing the Redo,LGWR and the Redo notification process has completed the submission, the round-trip notification will also consume CPU. In addition to back-and-forth notification, Commit also has operations such as adding SCN, etc. If there is a large gap between log file sync and log file parallel write, it proves that there is no problem with CPU O, but it may be due to the shortage of CPU resources, resulting in insufficient CPU for back-and-forth notification of process and LGWR or other operations that require CPU, resulting in delays in production.

In this case, we should take a look at the CPU occupancy and Load. If the Load is very high and the CPU utilization is also high, it is because CPU leads to a longer Log file sync response time. In this case, the database usually has some complications, such as Latch/Mutex competition will be more serious than usual, because CPU is tight, Latch/Mutex competition will be heavier.

3. There is a big difference between log file sync and log file parallel write, but the utilization rate of CPU is not high, which is relatively rare, so it belongs to the category of difficult and complicated diseases. The log fie parallel write response time is very short, but the log file sync response time is really large. This is the most difficult to locate the situation, you can fully compare the Redo-related data (data in v$sysstat), Redo-related Latch changes.

For example, the average response time of redo synch time, not every redo synch time has a commit, but every submission must have a redo synch time. If the redo synch time response is fast and the log file sync is slow, there is something wrong with the mutual notification phase of the Lgwr and the process. And redo entries, the number of Redo entries, really means the number of times the process writes Redo to the Log Buffer. Redo log space wait time, redo log space requests data, and Log Buffer Space wait events should also be followed. The size of Log Buffer usually does not affect Log File Sync, but through the change of Log Buffer, you can understand the change of Redo quantity.

On the influence of Log Buffer on Log File Sync

Under the new IMU mechanism, the Redo data is first in the shared pool, and when it is submitted, it is transferred to the Log Buffer. If there is a wait, the waiting time is Log Buffer Space. From Log Buffer to disk, the wait event is log file sync.

The same is true under the old mechanism, the wait before Log Buffer is the wait after log buffer space,log buffer is log file sync.

4. The control file Iram O may affect log file sync.

This problem has not been studied in depth, but this phenomenon has been observed in Ali's database before.

5. Log File Sycn and Buffer Busy Waits.

It's not directly related. It is other reasons, such as Redo-related Latch, that cause Log File Sync and Buffer Busy Waits to appear at the same time. At this time, Log File Sync and Buffer Busy Waits are not the original culprits, but the real culprit is the decline in Log Buffer access performance.

6. Sending Redo to the remote DataGuard in synchronous mode will also cause Log File Sync.

Redo is an important optimization object of Oracle. I have almost deciphered the working principle of DBWR. The next goal is LGWR. Unfortunately, I haven't had time to do it yet. I will summarize it in detail for you later.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.