Troubleshooting SQL Serer Latch and Latch timeout 04/25 Update SLTechnology News&Howtos

Troubleshooting SQL Serer Latch and Latch timeout

2025-04-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Troubleshooting SQL Serer Latch and Latch timeout

Translated from: https://mssqlwiki.com/2012/09/07/latch-timeout-and-sql-server-latch/

In a multithreaded process, what happens when one thread updates a data or index page in memory while another thread is reading the same page?

What happens when the first thread reads a data or index page in memory and the second thread is releasing the same page from memory?

The answer is: we will get the results of inconsistent data or data structures. To avoid inconsistencies, SQL Server uses synchronization mechanisms such as Locks, Latches, and Spinlocks.

In this blog post, we will discuss some key points about latches and how to troubleshoot latch timeout dump.

What is a Latch?

They control concurrent access to data pages and structures through multithreading. Latches provide physical data consistency for data pages and provide synchronization of data results. Latches cannot be controlled by the user like locks.

Type of latch:

Buffer (BUF) Latch

Used to access BUF structures and their associated database pages synchronously.

Buffer "IO" Latch

A subset of Buffer Latch when the BUF and associated data / index pages are in the middle of an IO operation (reading pages from disk or writing pages to disk).

Non-Buffer (Non-BUF) Latch

These latches are used to synchronize general in-memory data structures, which are commonly used in query / task execution such as parallel threads, auto-grow operations, and shrink operations.

Latch mode:

Keep (KP) Latches

Used to ensure that the page is not freed from memory when it is in use.

Shared (SH) Latches

Used for read-only access to data structures and to prevent write access from other threads.

This mode allows shared access.

SH is compatible with KP, SH, and UP. It should be noted that although SH usually indicates read-only access, this is not always the case. The minimum schema requirement for Buffer Latches,SH is to read a data page.

Update (UP) Latches

Allow read access to data structures (compatible with SH and KP), but block other EX-latch access.

Used for write operations when page split detection is turned off and when AWE is not enabled.

Exclusive (EX) Latches

Prevents read activity that occurs on the latched structure. EX is only compatible with KP.

During reading IO and writing IO when page split detection is turned on or AWE is enabled.

Destroy (DT) Latches

Used when removing BUFS from Buffer Pool, either by adding them to the free list or by unmapping AWE Buffers.

Latch compatibility:

KP SH UP EX DT

KP Y Y Y Y N

SH Y Y Y N N

UP Y Y N N N

EX Y N N N N

DT N N N N N

How to identify latch contention?

Latch contention can be identified by the type of wait in the sysprocesses.

PAGEIOLATCH_*:

The wait type in this sysprocesses indicates that the SQL Server is waiting for the physical Iripple O of a Buffer Pool page to complete.

1.PAGEIOLATCHQ * is usually resolved by tuning a query that performs a large number of IO operations (usually introducing the number of physical IO by adding, modifying, and removing indexes or statistics).

two。 Identify disk bottlenecks and fix them (PAGEIOLATCH wait time (for example, greater than 30ms).

PAGELATCH_*:

The wait type in this sysprocesses indicates that SQL Server is waiting to access a database page, but the page does not experience a physical IO.

1. This problem is usually caused by a large number of sessions trying to access the same physical page at the same time. You should check the waiting resources for spid. This wait_resource is the page number accessed (formatted as dbid:file:pageno).

two。 We can use DBCC PAGE to identify objects or page types where contention occurs. It also helps us determine whether contention is used for allocation, data, or text.

3. If the page that SQL Server waits most frequently is in the tempdb database, check the waiting resource column for a page number for 2 in dbid. You may face the tempdb latch contention mentioned here: http://support.microsoft.com/kb/328551

4. If the page is in a user database, check to see if the table has a clustered index on a monotonous key identity column, and all threads are competing for the same page at the end of the table. In this case, we need to select a different aggregation key to distribute the work to different pages.

LATCH_*:

Non-buf latch waiting can be caused by a variety of things. We can use this wait resource column in sysprocesses to determine the type of latch included (KB 822101).

1. A very common LATCH_EX wait is due to running a Profiler trace or sp_trace_getdata referencing KB 929728 for more information.

two。 Automatic growth and automatic contraction.

When a latch is requested by a thread, and if the latch cannot be granted immediately because other threads hold an incompatible latch on the same page or data structure, the requester must wait for the latch to be granted. If the wait interval reaches 5 minutes (waittime 300), a warning message similar to the following is output in the SQL Server error log, and a mini dump for all threads is captured. Warning messages differ between buffer and non-buffer latches.

Time out occurred while waiting for buffer latch-type% d, bp% p, page% dvv% d, stat% # x, database id:% d, allocation unit id:% I64d%ls, task 0x%p:% d, waittime% d, flags 0x%I64x, owning task 0x%p. Continuing to wait.846: a time-out occurred while waiting for buffer latch-type% d, bp% p, page% dvv% d, stat% # x, database id:% d, allocation unit Id:% I64d%ls Task 0x%p:% d, waittime% d, flags 0x%I64x, owning task 0x%p. Not continuing to wait.847: Timeout occurred while waiting for latch: class'% ls', id% p, type% d, Task 0x%p:% d, waittime% d, flags 0x%I64x, owning task 0x%p. Continuing to wait.

Split the above warning

Type (type):

The current latch gets the requested latch mode. This one uses the following matching numeric value:

0-NL (not used), 1-KP; 2-SH; 3-UP; 4-EX; 5-DT.

Task (task):

We try to get the task of latch.

Wait time (Waittime):

The total time in seconds to wait for the latch to get the request.

Tasks you have (owning task):

Available task addresses with latches.

Bp (Buffer latches only):

The address of the BUF structure that corresponds to Buffer latch.

Page (Buffer latches only):

The page ID for the page currently contained in the BUF structure.

Database id (Buffer latches only):

For the database ID of pages in BUF.

Just like troubleshooting blocking problems in SQL Server, when there is a latch contention or timeout dump, identify the owner of the latch and troubleshoot why the latch has been held by that owner for a long time.

When there is a latch timeout dump, you can see a warning message similar to one of the following. Before dump is important to find the owner thread of the latch, the warning error message is printed in the SQL Server error log.

2012-01-18 00 spid69 A time-out occurred while waiting for buffer latch-type 4, bp 00000000ECFDAA00, page 1 type 6088, stat 0x4c1010f, database id: 4, allocation unit Id: 72057594043367424, task 0x0000000006E096D8: 0, waittime 300, flags 0x19 owning task 0x0000000006E08328. Not continuing to wait.spid21s * * Dump thread-spid = 21, PSS = 0x0000000094622B60 EC = 0x0000000094622B70spid21s * Stack Dump being sent to E:\ Data\ Disk1\ MSSQL.1\ MSSQL\ LOG\ SQLDump0009.txtspid21s * spid21s * BEGIN STACK DUMP:spid21s * 02 28 spid 21spid21s 12 00:32:03 spid 21spid21s * Latch timeoutTimeout occurred while waiting for latch: class' ACCESS_METHODS_HOBT_COUNT' Id 00000002D8C32E70, type 2, Task 0x00000000008FCBC8: 7, waittime 300, flags 0x1a, owning task 0x00000000050E1288. Continuing to wait.Timeout occurred while waiting for latch: class' ACCESS_METHODS_HOBT_VIRTUAL_ROOT', id 00000002D8C32E70, type 2, Task 0x00000000008FCBC8: 7, waittime 300, flags 0x1a, owning task 0x00000000050E1288. Continuing to wait.

From the above error message, we can easily understand that we are trying to request a latch at database ID 4, page 1 0x0000000006E08328 6088 (page 6088 of the first file), and timed out because the task 0x0000000006E08328 (which owns the task 0x0000000006E08328 in the warning message) is holding a latch on it.

Note: a task is just a work request executed by a thread. (like system task, login task, ghost cleanup task, etc.) The thread performing this task will hold the required latches as needed.

Let's look at how to analyze the latch timeout dump and the owning thread that uses the owning task 0x0000000006E08328 to acquire the latch.

To analyze dump, download and install Windows Debugger from http://sdrv.ms/MO6ytG.

Step 1:

Open Windbg. Select the "File" menu, select "Open crash dump", and select "Dump file" (SQLDump000#.mdmp).

Step 2:

Enter in the command line window

.sympath srv*c:\ Websymbols* http://msdl.microsoft.com/download/symbols;

Step 3:

Enter .reload / f and enter. This forces debugger to load all symbols immediately.

Step 4:

Verify that the symbol is loaded by SQL Server by using the debugger command lmvm.

0RU 002 > lmvm sqlservr

Start end module name

000000`01000000 00000000`03679000 sqlservr T (pdb symbols) c:\ websymbols\ sqlservr.pdb\ 21E4AC6E96294A529C9D99826B5A7C032\ sqlservr.pdb

Loaded symbol p_w_picpath file: sqlservr.exe

Image path: C:\ Program Files\ Microsoft SQL Server\ MSSQL.1\ MSSQL\ Binn\ sqlservr.exe

Image name: sqlservr.exe

Timestamp: Wed Oct 07 21:15:52 2009 (4ACD6778)

CheckSum: 025FEB5E

ImageSize: 02679000

File version: 2005.90.4266.0

Product version: 9.0.4266.0

File flags: 0 (Mask 3F)

File OS: 40000 NT Base

File type: 1.0 App

File date: 00000000.00000000

Translations: 0000.04b0 400 0409.04b0 4.0904 million

Step 5:

Use the following command to search the thread stack to identify the thread associated with the task you own, and it is the thread that owns the latch. Replace 0X0000000006E08328 with the tasks you have in your error log

~ * e .echo ThreadId:;?? @ $tid; r? @ $T1 = ((StackLimit; r) @ $teb)-> StackLimit; r? @ $T2 = ((NTD r) @ $teb)-> StackBase; s-d @ $T1 @ $T2

ThreadId:

Unsigned int 0x93c

ThreadId:

Unsigned int 0x9a0

ThreadId:

Unsigned int 0x9b4

00000000`091fdaf0 06e08328 00000000 00000000 00000000 (…

00000000`091fdcb8 06e08328 00000000 091fdd70 00000000 .p. .

00000000`091fded0 06e08328 00000000 06e0e798 00000000 (…

00000000`091fdf38 06e08328 00000000 00000002 00000000 (…

000000`091fec60 06e08328 00000000 0168883a 00000000 .h. ..

00000000`091ff260 06e08328 00000000 000007d0 00000000 (…

00000000`091ff2d0 06e08328 00000000 00000020 00000000 . …… .

000000`091ff5f8 06e08328 00000000 800306c0 00000000 (…

00000000`091ff6c0 06e08328 00000000 00000000 00000000 (…

00000000`091ff930 06e08328 00000000 00000000 00000001 (…

000000`091ff9b8 06e08328 00000000 00000000 00000000 (…

00000000`091ffa38 06e08328 00000000 00000000 00000000 (…

00000000`091ffc10 06e08328 00000000 03684080 00000000 .. @ h... ..

00000000`091ffc90 06e08328 00000000 00000000 00000000 (…

ThreadId:

Unsigned int 0x9b8

ThreadId:

Unsigned int 0x9bc

ThreadId:

Unsigned int 0x9c0

...

. ..

Step 6:

From the above output, we can see that the thread 0x9b4 is related to the pointer to the task it owns, and it is the thread that owns the latch. Let's switch to the thread (0x9b4), which is performing the task it has, and then browse the stack to see why the thread has held the latch for a long time.

Step 7:

~ [0x9b4] s = > Switching to the thread (Replace 0x9b4 with your thread id which has reference to the po

Ntdllchecked ZwWaitForSingleObjectroom0xa:

00000000`77ef047a c3 ret

Step 8:

0 002 > kC = = > Print the stack

Call Site

Ntdll!ZwWaitForSingleObject

Kernel32!WaitForSingleObjectEx

Sqlservr!SOS_Scheduler::SwitchContext

Sqlservr!SOS_Scheduler::Suspend

Sqlservr!SOS_Event::Wait

Sqlservr!BPool::FlushCache

Sqlservr!checkpoint2

Sqlservr!alloca_probe

Sqlservr!ProcessCheckpointRequest

Sqlservr!CheckpointLoop

Sqlservr!ckptproc

Sqlservr!SOS_Task::Param::Execute

Sqlservr!SOS_Scheduler::RunTask

Sqlservr!SOS_Scheduler::ProcessTasks

Sqlservr!SchedulerManager::WorkerEntryPoint

Sqlservr!SystemThread::RunWorker

Sqlservr!SystemThreadDispatcher::ProcessWorker

Sqlservr!SchedulerManager::ThreadEntryPoint

Msvcr80!endthreadex

From the above stack, we can understand that the thread with the latch is performing a checkpoint and flushing the cache (dirty pages) to disk. If it takes a long time to flush the cache to disk (checkpoint), there is obviously a disk bottleneck.

Similarly, for other latch timeout problems, first identify the owner thread of the latch, read the owner thread's stack to understand the tasks performed by the owner thread, and troubleshoot performance problems caused by tasks performed by the owner thread.

If you want to view the stack of waiting threads, get the task (task 0x0000000006E096D8) from the latch timeout warning message in the error log instead of the owner task (task 0x0000000006E08328), and use the command mentioned in step 5.

I hope this blog post will help you learn and troubleshoot latch timeouts.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.