Why SQL Server instances are under pressure 07/06 Update SLTechnology News&Howtos

Why SQL Server instances are under pressure

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Why SQL Server instances are under pressure

Translated from: https://www.simple-talk.com/sql/database-administration/why-is-that-sql-server-instance-under-stress/

When you encounter a performance problem on a SQL Server instance, some metrics are enough to tell you the essence, and you can then quickly focus on the actual cause. There are many different performance metrics that can be used to understand how your SQL Server instance is doing. Each of these indicators will give you a positive hint. There may be internal or external memory pressure, excessive or irregular CPU loads, or IO bottlenecks. If you get a positive hint of looking for a problem, you can then delve into the details. For example, if you determine that there is excessive CPU pressure in SQL Server, then you need to determine which query causes most CPU.

So what are these common hints? It is useful to get direct information about the total processor time, the utilization of CPU on the server, or the average read time spent reading data from disk. When you've done this, then, there are other ways to find out what's going on on the system, and they provide even more useful information. Let's take a closer look at some of these methods.

Is your instance under memory pressure?

Sys.dm_os_ring_buffers

Ring buffering within the operating system is a collection point for specific system information types. Most of them are little-known system information, many of which are related to the management of ring buffering itself, but some of the buffer information is extremely interesting. For example, when the operating system realizes that it is running in low memory, a message is recorded in the ring buffer, and if there is a memory warning, you can find it by using the DMV object sys.dm_os_ring_buffers.

There are actually two types of memory warnings. You can get a memory warning for the physical memory of the running machine. From a SQL Server perspective, this is called external memory because it is not memory managed by the SQL Server service. You also get a virtual memory warning, which is being managed by the SQL Server service. We often call it internal memory. When any one runs low, you can see that a warning is recorded in the ring buffer. You can also see warnings when you have enough memory or have a large memory growth.

You just need to query the DMV like this:

SELECT * FROM sys.dm_os_ring_buffers AS dorb

That will return the available information, and you will soon find that most of the growth information returned is in the "record" column. This is a text column that stores XML, and you can use the following more in-depth query to get the information you are interested in:

WITH RingBuffer AS (SELECT CAST (dorb.record AS XML) AS xRecord, dorb.TIMESTAMP FROM sys.dm_os_ring_buffers AS dorb WHERE dorb.ring_buffer_type = 'RING_BUFFER_RESOURCE_MONITOR') SELECT xr.value (' (ResourceMonitor/Notification) [1]', 'varchar (75)') AS RmNotification Xr.value ('(ResourceMonitor/IndicatorsProcess) [1]', 'tinyint') AS IndicatorsProcess, xr.value (' (ResourceMonitor/IndicatorsSystem) [1]', 'tinyint') AS IndicatorsSystem, DATEADD (ss, (- 1 *) (dosi.cpu_ticks / CONVERT (FLOAT) (dosi.cpu_ticks / dosi.ms_ticks)-rb.TIMESTAMP) / 1000), GETDATE () AS RmDateTime, xr.value ('(MemoryNode/TargetMemory) [1]', 'bigint') AS TargetMemory, xr.value (' (MemoryNode/ReserveMemory) [1]', 'bigint') AS ReserveMemory, xr.value (' (MemoryNode/CommittedMemory) [1]' 'bigint') AS CommitedMemory, xr.value (' (MemoryNode/SharedMemory) [1]', 'bigint') AS SharedMemory, xr.value (' (MemoryNode/PagesMemory) [1]', 'bigint') AS PagesMemory, xr.value (' (MemoryRecord/MemoryUtilization) [1]', 'bigint') AS MemoryUtilization, xr.value (' (MemoryRecord/TotalPhysicalMemory) [1]', 'bigint') AS TotalPhysicalMemory Xr.value ('(MemoryRecord/AvailablePhysicalMemory) [1]', 'bigint') AS AvailablePhysicalMemory, xr.value (' (MemoryRecord/TotalPageFile) [1]', 'bigint') AS TotalPageFile, xr.value (' (MemoryRecord/AvailablePageFile) [1]', 'bigint') AS AvailablePageFile, xr.value (' (MemoryRecord/TotalVirtualAddressSpace) [1]', 'bigint') AS TotalVirtualAddressSpace Xr.value ('(MemoryRecord/AvailableVirtualAddressSpace) [1]', 'bigint') AS AvailableVirtualAddressSpace, xr.value (' (MemoryRecord/AvailableExtendedVirtualAddressSpace) [1]', 'bigint') AS AvailableExtendedVirtualAddressSpace FROM RingBuffer AS rb CROSS APPLY rb.xRecord.nodes (' Record') record (xr) CROSS JOIN sys.dm_os_sys_info AS dosi ORDER BY RmDateTime DESC

Using this query, I first created a common table expression (CTE) called RingBuffer. I only did two things there, first filtering a specific ring buffer type "RING_BUFFER_RESOURCE_MONITOR". This is where the memory information is generated. Second, I converted the "Record" column from text to XML. There I use CTE queries and use the XQuery command to get all the information I'm interested in from the XML data.

As an additional factor, the timestamp column in sys.dm_os_ring_buffers is actually a datetime value, but it is based on the CPU frequency, so you have to use that formula to convert the data into readable date and time.

Using sys.dm_os_buffers as part of the monitoring process, you only need to look for these two events, RESOURCE_MEMPHYSICAL_LOW or RESOURCE_MEMVIRTUAL_LOW. These are the ResourceMonitor/Notification properties available in XML. They are an absolute indicator of low memory conditions on the machine, so if you get a warning, you either have low external / OS/ physical memory or internal / SQL Server/ virtual memory.

Is the system under load?

A question arises, "is the system under load?"

There are plenty of different ways to try to understand this, but only a few let you know for sure whether you are under stress or not.

Sys.dm_os_workers

One of my favorite ways to determine exactly how much work is going on in the system is to look at sys.dm_os_workers. This metric does not tell you what is causing the load on the system, nor does it allow you to understand the impact of the load. However, it is a perfect measure of the load on the system.

A large amount of information was returned from sys.dm_os_workers. This DMV returns information about the worker process on the operating system. You can look at information about the process, such as the last wait type, whether the worker process has an exception, how many context switches it has experienced, and all kinds of things. The DMV in the online help documentation even shows how to determine how long a process is runnable.

Just use it as a measure of load, and your query is extremely easy:

SELECT COUNT (*) FROM sys.dm_os_workers AS dowWHERE state = 'RUNNING'

It's really that simple. As the number goes up and down, the load on the system goes up and down. There are a lot of "RUNNING" work processes today that you didn't have yesterday, and there is an increase in load on your system. But keep in mind that you need to compare for a while. Capturing only one number doesn't mean anything. You need to be able to compare between the two values.

Sys.dm_os_schedulers

Another way to measure system load is to look at the scheduler. This is the process of managing the work process. Third, this is the absolute measure of the load on the system. It can tell you how much work is being done on the system.

Viewing the scheduler produces a great deal of interesting information about the system being managed. You can see how many processes are being processed by a particular scheduler. You can see the number of times the scheduler conceded CPU (giving up access to another process because each process is only given restricted access to CPU), a large number of currently active worker processes in the scheduler, and some other details.

However, to look at a load measure, you can run a very simple query:

SELECT COUNT (*) FROM sys.dm_os_schedulers AS dosWHERE dos.is_idle = 0

Again, this number makes sense only when compared to previous values. Using a worker process or scheduler as a workload measure is as accurate as your limit data, but if you maintain a collection of these values for a period of time, you can determine the load on the operating system.

Does SQL Server have enough memory?

DBCC MEMORYSTATUS

This is an amazing collection of data. What you get is the output of various memory management in SQL Server. You can see each memory allocated and managed in SQL Server. This command is used frequently when you work with customer support engineers from Microsoft to troubleshoot specific issues. However, this is another way to determine exactly how well memory works on the system.

If you only run this command:

DBCC MEMORYSTATUS ()

You will see all the various memory allocation and management processes in SQL Server. All of them. In fact, having so much information quickly became a meaningless attempt. The good news is that the good news is that you can target specific pieces of information. If we specifically pursue the Target committed value and the Current committed value, we can determine if SQL Server has enough memory. It's simple. If the Target value is higher than the current value, you don't have the memory you need in SQL Server. But getting these values is a bit of a headache. Here's a way:

DECLARE @ MemStat TABLE (ValueName SYSNAME, Val BIGINT); INSERT INTO @ MemStat EXEC ('DBCC MEMORYSTATUS () WITH TABLERESULTS') WITH Measures AS (SELECT TOP 2 CurrentValue ROW_NUMBER () OVER (ORDER BY OrderColumn) AS RowOrder FROM (SELECT CASE WHEN (ms.ValueName = 'Target Committed') THEN ms.Val WHEN (ms.ValueName =' Current Committed') THEN ms .Val END AS 'CurrentValue' 0 AS 'OrderColumn' FROM @ MemStat AS ms) AS MemStatus WHERE CurrentValue IS NOT NULL) SELECT TargetMem.CurrentValue-CurrentMem.CurrentValue FROM Measures AS TargetMem JOIN Measures AS CurrentMem ON TargetMem.RowOrder + 1 = CurrentMem.RowOrder

I created a table variable and then used TABLERESULTS to import all the output from MEMORYSTATUS, making sure that the output was a table. By using a common table expression (CTE) to define the information selected from the table variable, I can use the select statement and JOIN two values based on ROW_NUMBER to reference it twice. It really works. If you get a negative value, look at the memory problem.

Understand that DBCC MEMORYSTATUS refers to a Microsoft support mechanism. It is not part of the standard tool set. This means that it is all about unannounced changes to SQL Server from one version to the next, or even a patch pack to the next. With this deep understanding, use it immediately to diagnose memory problems.

Do I need a better, faster disk system?

Sys.dm_io_virtual_file_stats

This dynamic management view returns statistics on the behavior of files on your database. The most interesting pieces of information here are stalls, waits, gathered, and available. If you simply run this query:

SELECT * FROM sys.dm_io_virtual_file_stats (DB_ID (DB_NAME ()), NULL) AS divfs

You have to pass it two messages: database ID, by using DB_NAME to identify the currently attached database, and then passing it to DB_ID; and the file ID, I can pass NULL as an argument to all the files that return the data.

The information returned is great, especially these four columns: sample_ms, io_stall_read_ms, io_stall_write_ms, and io_stall. Let's take a look at what these stand for, and you'll soon understand how interested they are in you as DBA. Sample_ms is very straightforward. Since the last time SQL Server rebooted. It provides metrics to understand all other values. Next is io_stall_read_ms. This represents the total amount of time that the process is forced to wait for a read operation from this device. If you combine is_stall_read_ms with sample_ms, you will get an accurate measure of the percentage of time your application is waiting to read from a particular file in a particular database. You also get is_stall_write_ms, which represents the total amount of time the process has been waiting for the write operation. You can collect these performance metrics for a period of time to see how they grow, or use sample_ms in the same way as read operations. Finally, io_stall shows the total amount of wait time that occurs on that file for any io operation. Third, you can collect it for a period of time to see how it grows (because it will always grow) or you can get the percentage of time waiting for disk by comparing sample_ms.

These methods will tell you exactly how serious the problem of io waiting on the system is. But they cannot locate specific queries. Instead, this approach focuses on determining if there is something wrong with your system. You need more disks, faster disks, etc.

How's CPU doing?

Sys.dm_os_wait_stats

I list it as a little-known way to collect performance metrics, but it's really not. So far everyone has heard to understand what the server is waiting for is a good way to understand what causes the server to run slowly. But I still see a lot of people surprised that they can find this information.

Sys.dm_os_wait_stats shows an aggregated view of what the server has been waiting for since it was last started (or since waiting for statistics to be cleaned up). This information is broken down into specific types of waiting, some of which are really obscure. I won't try to document them, and even Microsoft doesn't support complete documentation about them. You need to rely on web search to identify what some wait types represent. Others have documentation output in the online help documentation, so take advantage of this great resource.

To query sys.dm_os_wait_stats, run a query like this:

SELECT * FROM sys.dm_os_wait_stats AS dows

The output has only five columns: wait_type, waiting_tasks_count, wait_time_ms, max_wait_time_ms, and singal_wait_time_ms. The only one that can't be understood immediately by name is singal_wait_time_ms. This column represents the amount of time when a thread is called and how much time it actually starts execution. This time is included in the total time wait_time_ms. The single_wait_time_ms is then an actual measure of the wait time to get the CPU. This is a good measure of how much load CPU supports. Because of this, usually when you should fully check the wait statistics, you should always focus on single_wait_time_ms alone in order to understand how CPU is performing. You will become experienced with this, where you can focus only on the following queries:

SELECT SUM (dows.signal_wait_time_ms)

FROM sys.dm_os_wait_stats AS dows

This represents a cumulative total of CPU waits that occur on the system. This is a good indicator of short-term load. You need to compare different ways to see how it grows.

Summary

These are just common examples of stress, and you can help understand how it works by checking to quickly focus on specific aspects of the system. Using these methods, you can quickly identify or evaluate the possibility of causing performance problems.

Each method provides a positive enough hint for you to confirm that there is memory pressure or that the CPU is under load. Once you understand the general nature of stress, in order to understand which query leads to the most CPU, you need to know more about other criteria.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.