In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
Troubleshooting SQL Serer Latch and Latch timeout
Translated from: https://mssqlwiki.com/2012/09/07/latch-timeout-and-sql-server-latch/
In a multithreaded process, what happens when one thread updates a data or index page in memory while another thread is reading the same page?
What happens when the first thread reads a data or index page in memory and the second thread is releasing the same page from memory?
The answer is: we will get the results of inconsistent data or data structures. To avoid inconsistencies, SQL Server uses synchronization mechanisms such as Locks, Latches, and Spinlocks.
In this blog post, we will discuss some key points about latches and how to troubleshoot latch timeout dump.
What is a Latch?
They control concurrent access to data pages and structures through multithreading. Latches provide physical data consistency for data pages and provide synchronization of data results. Latches cannot be controlled by the user like locks.
Type of latch:
Buffer (BUF) Latch
Used to access BUF structures and their associated database pages synchronously.
Buffer "IO" Latch
A subset of Buffer Latch when the BUF and associated data / index pages are in the middle of an IO operation (reading pages from disk or writing pages to disk).
Non-Buffer (Non-BUF) Latch
These latches are used to synchronize general in-memory data structures, which are commonly used in query / task execution such as parallel threads, auto-grow operations, and shrink operations.
Latch mode:
Keep (KP) Latches
Used to ensure that the page is not freed from memory when it is in use.
Shared (SH) Latches
Used for read-only access to data structures and to prevent write access from other threads.
This mode allows shared access.
SH is compatible with KP, SH, and UP. It should be noted that although SH usually indicates read-only access, this is not always the case. The minimum schema requirement for Buffer Latches,SH is to read a data page.
Update (UP) Latches
Allow read access to data structures (compatible with SH and KP), but block other EX-latch access.
Used for write operations when page split detection is turned off and when AWE is not enabled.
Exclusive (EX) Latches
Prevents read activity that occurs on the latched structure. EX is only compatible with KP.
During reading IO and writing IO when page split detection is turned on or AWE is enabled.
Destroy (DT) Latches
Used when removing BUFS from Buffer Pool, either by adding them to the free list or by unmapping AWE Buffers.
Latch compatibility:
KP SH UP EX DT
KP Y Y Y Y N
SH Y Y Y N N
UP Y Y N N N
EX Y N N N N
DT N N N N N
How to identify latch contention?
Latch contention can be identified by the type of wait in the sysprocesses.
PAGEIOLATCH_*:
The wait type in this sysprocesses indicates that the SQL Server is waiting for the physical Iripple O of a Buffer Pool page to complete.
1.PAGEIOLATCHQ * is usually resolved by tuning a query that performs a large number of IO operations (usually introducing the number of physical IO by adding, modifying, and removing indexes or statistics).
two。 Identify disk bottlenecks and fix them (PAGEIOLATCH wait time (for example, greater than 30ms).
PAGELATCH_*:
The wait type in this sysprocesses indicates that SQL Server is waiting to access a database page, but the page does not experience a physical IO.
1. This problem is usually caused by a large number of sessions trying to access the same physical page at the same time. You should check the waiting resources for spid. This wait_resource is the page number accessed (formatted as dbid:file:pageno).
two。 We can use DBCC PAGE to identify objects or page types where contention occurs. It also helps us determine whether contention is used for allocation, data, or text.
3. If the page that SQL Server waits most frequently is in the tempdb database, check the waiting resource column for a page number for 2 in dbid. You may face the tempdb latch contention mentioned here: http://support.microsoft.com/kb/328551
4. If the page is in a user database, check to see if the table has a clustered index on a monotonous key identity column, and all threads are competing for the same page at the end of the table. In this case, we need to select a different aggregation key to distribute the work to different pages.
LATCH_*:
Non-buf latch waiting can be caused by a variety of things. We can use this wait resource column in sysprocesses to determine the type of latch included (KB 822101).
1. A very common LATCH_EX wait is due to running a Profiler trace or sp_trace_getdata referencing KB 929728 for more information.
two。 Automatic growth and automatic contraction.
When a latch is requested by a thread, and if the latch cannot be granted immediately because other threads hold an incompatible latch on the same page or data structure, the requester must wait for the latch to be granted. If the wait interval reaches 5 minutes (waittime 300), a warning message similar to the following is output in the SQL Server error log, and a mini dump for all threads is captured. Warning messages differ between buffer and non-buffer latches.
Time out occurred while waiting for buffer latch-type% d, bp% p, page% dvv% d, stat% # x, database id:% d, allocation unit id:% I64d%ls, task 0x%p:% d, waittime% d, flags 0x%I64x, owning task 0x%p. Continuing to wait.846: a time-out occurred while waiting for buffer latch-type% d, bp% p, page% dvv% d, stat% # x, database id:% d, allocation unit Id:% I64d%ls Task 0x%p:% d, waittime% d, flags 0x%I64x, owning task 0x%p. Not continuing to wait.847: Timeout occurred while waiting for latch: class'% ls', id% p, type% d, Task 0x%p:% d, waittime% d, flags 0x%I64x, owning task 0x%p. Continuing to wait.
Split the above warning
Type (type):
The current latch gets the requested latch mode. This one uses the following matching numeric value:
0-NL (not used), 1-KP; 2-SH; 3-UP; 4-EX; 5-DT.
Task (task):
We try to get the task of latch.
Wait time (Waittime):
The total time in seconds to wait for the latch to get the request.
Tasks you have (owning task):
Available task addresses with latches.
Bp (Buffer latches only):
The address of the BUF structure that corresponds to Buffer latch.
Page (Buffer latches only):
The page ID for the page currently contained in the BUF structure.
Database id (Buffer latches only):
For the database ID of pages in BUF.
Just like troubleshooting blocking problems in SQL Server, when there is a latch contention or timeout dump, identify the owner of the latch and troubleshoot why the latch has been held by that owner for a long time.
When there is a latch timeout dump, you can see a warning message similar to one of the following. Before dump is important to find the owner thread of the latch, the warning error message is printed in the SQL Server error log.
2012-01-18 00 spid69 A time-out occurred while waiting for buffer latch-type 4, bp 00000000ECFDAA00, page 1 type 6088, stat 0x4c1010f, database id: 4, allocation unit Id: 72057594043367424, task 0x0000000006E096D8: 0, waittime 300, flags 0x19 owning task 0x0000000006E08328. Not continuing to wait.spid21s * * Dump thread-spid = 21, PSS = 0x0000000094622B60 EC = 0x0000000094622B70spid21s * Stack Dump being sent to E:\ Data\ Disk1\ MSSQL.1\ MSSQL\ LOG\ SQLDump0009.txtspid21s * spid21s * BEGIN STACK DUMP:spid21s * 02 28 spid 21spid21s 12 00:32:03 spid 21spid21s * Latch timeoutTimeout occurred while waiting for latch: class' ACCESS_METHODS_HOBT_COUNT' Id 00000002D8C32E70, type 2, Task 0x00000000008FCBC8: 7, waittime 300, flags 0x1a, owning task 0x00000000050E1288. Continuing to wait.Timeout occurred while waiting for latch: class' ACCESS_METHODS_HOBT_VIRTUAL_ROOT', id 00000002D8C32E70, type 2, Task 0x00000000008FCBC8: 7, waittime 300, flags 0x1a, owning task 0x00000000050E1288. Continuing to wait.
From the above error message, we can easily understand that we are trying to request a latch at database ID 4, page 1 0x0000000006E08328 6088 (page 6088 of the first file), and timed out because the task 0x0000000006E08328 (which owns the task 0x0000000006E08328 in the warning message) is holding a latch on it.
Note: a task is just a work request executed by a thread. (like system task, login task, ghost cleanup task, etc.) The thread performing this task will hold the required latches as needed.
Let's look at how to analyze the latch timeout dump and the owning thread that uses the owning task 0x0000000006E08328 to acquire the latch.
To analyze dump, download and install Windows Debugger from http://sdrv.ms/MO6ytG.
Step 1:
Open Windbg. Select the "File" menu, select "Open crash dump", and select "Dump file" (SQLDump000#.mdmp).
Step 2:
Enter in the command line window
.sympath srv*c:\ Websymbols* http://msdl.microsoft.com/download/symbols;
Step 3:
Enter .reload / f and enter. This forces debugger to load all symbols immediately.
Step 4:
Verify that the symbol is loaded by SQL Server by using the debugger command lmvm.
0RU 002 > lmvm sqlservr
Start end module name
000000`01000000 00000000`03679000 sqlservr T (pdb symbols) c:\ websymbols\ sqlservr.pdb\ 21E4AC6E96294A529C9D99826B5A7C032\ sqlservr.pdb
Loaded symbol p_w_picpath file: sqlservr.exe
Image path: C:\ Program Files\ Microsoft SQL Server\ MSSQL.1\ MSSQL\ Binn\ sqlservr.exe
Image name: sqlservr.exe
Timestamp: Wed Oct 07 21:15:52 2009 (4ACD6778)
CheckSum: 025FEB5E
ImageSize: 02679000
File version: 2005.90.4266.0
Product version: 9.0.4266.0
File flags: 0 (Mask 3F)
File OS: 40000 NT Base
File type: 1.0 App
File date: 00000000.00000000
Translations: 0000.04b0 400 0409.04b0 4.0904 million
Step 5:
Use the following command to search the thread stack to identify the thread associated with the task you own, and it is the thread that owns the latch. Replace 0X0000000006E08328 with the tasks you have in your error log
~ * e .echo ThreadId:;?? @ $tid; r? @ $T1 = ((StackLimit; r) @ $teb)-> StackLimit; r? @ $T2 = ((NTD r) @ $teb)-> StackBase; s-d @ $T1 @ $T2
ThreadId:
Unsigned int 0x93c
ThreadId:
Unsigned int 0x9a0
ThreadId:
Unsigned int 0x9b4
00000000`091fdaf0 06e08328 00000000 00000000 00000000 (…
00000000`091fdcb8 06e08328 00000000 091fdd70 00000000 .p. .
00000000`091fded0 06e08328 00000000 06e0e798 00000000 (…
00000000`091fdf38 06e08328 00000000 00000002 00000000 (…
000000`091fec60 06e08328 00000000 0168883a 00000000 .h. ..
00000000`091ff260 06e08328 00000000 000007d0 00000000 (…
00000000`091ff2d0 06e08328 00000000 00000020 00000000 . …… .
000000`091ff5f8 06e08328 00000000 800306c0 00000000 (…
00000000`091ff6c0 06e08328 00000000 00000000 00000000 (…
00000000`091ff930 06e08328 00000000 00000000 00000001 (…
000000`091ff9b8 06e08328 00000000 00000000 00000000 (…
00000000`091ffa38 06e08328 00000000 00000000 00000000 (…
00000000`091ffc10 06e08328 00000000 03684080 00000000 .. @ h... ..
00000000`091ffc90 06e08328 00000000 00000000 00000000 (…
ThreadId:
Unsigned int 0x9b8
ThreadId:
Unsigned int 0x9bc
ThreadId:
Unsigned int 0x9c0
...
. ..
Step 6:
From the above output, we can see that the thread 0x9b4 is related to the pointer to the task it owns, and it is the thread that owns the latch. Let's switch to the thread (0x9b4), which is performing the task it has, and then browse the stack to see why the thread has held the latch for a long time.
Step 7:
~ [0x9b4] s = > Switching to the thread (Replace 0x9b4 with your thread id which has reference to the po
Ntdllchecked ZwWaitForSingleObjectroom0xa:
00000000`77ef047a c3 ret
Step 8:
0 002 > kC = = > Print the stack
Call Site
Ntdll!ZwWaitForSingleObject
Kernel32!WaitForSingleObjectEx
Sqlservr!SOS_Scheduler::SwitchContext
Sqlservr!SOS_Scheduler::Suspend
Sqlservr!SOS_Event::Wait
Sqlservr!BPool::FlushCache
Sqlservr!checkpoint2
Sqlservr!alloca_probe
Sqlservr!ProcessCheckpointRequest
Sqlservr!CheckpointLoop
Sqlservr!ckptproc
Sqlservr!SOS_Task::Param::Execute
Sqlservr!SOS_Scheduler::RunTask
Sqlservr!SOS_Scheduler::ProcessTasks
Sqlservr!SchedulerManager::WorkerEntryPoint
Sqlservr!SystemThread::RunWorker
Sqlservr!SystemThreadDispatcher::ProcessWorker
Sqlservr!SchedulerManager::ThreadEntryPoint
Msvcr80!endthreadex
Msvcr80!endthreadex
From the above stack, we can understand that the thread with the latch is performing a checkpoint and flushing the cache (dirty pages) to disk. If it takes a long time to flush the cache to disk (checkpoint), there is obviously a disk bottleneck.
Similarly, for other latch timeout problems, first identify the owner thread of the latch, read the owner thread's stack to understand the tasks performed by the owner thread, and troubleshoot performance problems caused by tasks performed by the owner thread.
If you want to view the stack of waiting threads, get the task (task 0x0000000006E096D8) from the latch timeout warning message in the error log instead of the owner task (task 0x0000000006E08328), and use the command mentioned in step 5.
I hope this blog post will help you learn and troubleshoot latch timeouts.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.