How to use tools to quickly locate database problems 07/03 Update SLTechnology News&Howtos

How to use tools to quickly locate database problems

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article shows you how to use tools to quickly locate database problems, the content is concise and easy to understand, can definitely brighten your eyes, through the detailed introduction of this article, I hope you can get something.

We often get business feedback, "Hello? xx port feels a little slow!" After confirming that there is nothing wrong with the database itself (no down, no database error), you need to test the various indicators of the server to troubleshoot the problem. Today, I would like to introduce a very useful testing tool sar.

common problem

In the daily database operation and maintenance, we often encounter the following situations:

Database card is slow

The server is overloaded

Server restart abnormally

When we encounter these situations, we usually use some tools to detect the status of the server.

Powerful sysstat toolkit

Sysstat is a software package that contains a set of tools for monitoring system performance and efficiency.

There are two ways to install:

Sudo yum install sysstat

Git clone git://github.com/sysstat/sysstat

Include tools

The sysstat tool set contains the following common tools:

Iostat: used to monitor the IO load of system devices. Mpstat: used to view the status information of each available CPU in multi-CPU environment.

Pidstat: used to monitor the consumption of system resources by all or specified processes.

Sar: one of the most comprehensive system performance analysis tools on Linux, which can report the activities of the system from many aspects.

Today, we will mainly introduce the powerful sar.

Sar

Sar (System Activity Reporter system activity report) is one of the most comprehensive system performance analysis tools on Linux. The sar tool will sample the current state of the system, and then show the current running status of the system from many aspects by calculating data and proportions. It includes: the reading and writing of files, the usage of system calls, the disk I / O, CPU efficiency, memory usage, process activities and IPC (inter-process communication) related activities.

Characteristics

The system can be sampled continuously and a large number of sampling data can be obtained.

The sampled data and the results of the analysis can be stored in a file, and the load required is very small.

Sar provides a wide range of options and is powerful

Sar statistical term

Disk I / O and data transfer rate statistics

CPU statistics

Memory, extra large pages and swap space utilization statistics

Virtual memory, paging, and failure statistics

Network statistics

Process creates statistics

Interrupt statistics

Traffic statistics for fibre Channel

NFS server and client activity statistics

Socket statistics

Queue and system load statistics

Kernel internal table statistics

TTY activity Statistics

File system utilization statistics

You can also see from the illustration that sar is quite powerful, and today we will only introduce a few parameters that are helpful for troubleshooting database problems.

1 troubleshoot CPU problem

Use the-u or-p parameter

Report statistics for CPU

Output item description:

CPU: all indicates that the statistics are the average of all CPU.

% user: the percentage of total time that the user level (application) runs using CPU.

% nice: user level, percentage of total CPU time spent on nice operations (the nice command is used to change the priority of the process)

% system: percentage of total CPU time spent running at the kernel level (kernel).

% iowait: the percentage of total CPU time spent waiting for the Imax O operation.

% steal: the percentage of the hypervisor waiting for the virtual CPU to serve another virtual process

% idle: the percentage of total CPU time occupied by CPU idle time.

Analysis:

If the value of% iowait is too high, it indicates that the hard disk has an Ipicuro bottleneck.

If the value of% idle is high but the system response is slow, it is possible that CPU is waiting to allocate memory, so you should increase the memory capacity at this time.

If the value of% idle is consistently lower than 1, the CPU processing capacity of the system is relatively low, indicating that the resource that needs to be solved most in the system is CPU.

Use the-Q parameter

Report process queue length and average load status.

Output item description:

Runq-sz: length of the run queue (number of processes waiting to run)

Plist-sz: number of processes (processes) and threads (threads) in the process list

Ldavg-1: * * 1 minute average system load (System load average)

Ldavg-5: average system load over the past 5 minutes

Ldavg-15: average system load over the past 15 minutes

2 troubleshooting memory problems

Use the-r parameter

Report memory and swap space usage

Output item description:

Kbmemfree: available memory (kb). This value is basically the same as the free value in the free command, so it does not include buffer and cache space.

Kbmemused: memory used (kb). But excluding the memory used by the kernel itself, this value is basically the same as the used value in the free command, so it includes buffer and cache space.

% memused: percentage of memory usage. This value is a percentage of kbmemused and total memory (excluding swap).

Kbbuffers: the memory used by the kernel itself for buffers (kb)

Kbcached: the memory used by the kernel itself for cached (kb)

Kbbuffers and kbcached: these two values are buffer and cache in the free command.

Kbswpfree: remaining swap space (kb)

Kbswpused: swap space used (kb)

% swpused: percentage of swap usage

Kbswpcad: the swap cache value between swap and memory. Data in memory is exchanged to swap

Use the-W parameter

Report swap statistics

Output item description:

Pswpin/s: number of system swapped to swap partition pages

Pswpout/s: the number of swap partitions swapped out by the system

Use the-B parameter

Memory paging statistics

Output item description:

Pgpgin/s: represents the number of bytes per second from disk or SWAP to memory (KB). The lower version of the kernel (2.2.x), which represents the number of blocks replaced per second.

Pgpgout/s: represents the number of bytes per second from memory to disk or SWAP (KB).

Fault/s: the number of missing pages per second generated by the system (major + minor) (version 2. 5 kernel) page-missing interrupts do not necessarily produce IBO.

Majflt/s: the number of main missing pages per second, data pages that need to be loaded from disk into memory (version 2.5 kernel report).

Description:

High paging is a sign of a lack of memory.

3. Troubleshoot the problem of Imax O.

Use the-b parameter

Displays statistics for Icano and disk read and write rates

Output item description:

Tps: the total amount of Istroke O transmissions per second of the physical device. Multiple logical requests for a physical device can be combined into a single device-specific Iamp O request.

Tps: the total amount of data read from the physical device per second

Wtps: the total amount of data written to the physical device per second

Bread/s: the amount of data read from a physical device per second, in blocks per second. Blocks older than 2.4 kernel are 512 bytes in size. The size of the kernel block in the lower version is uncertain.

Bwrtn/s: the amount of data written to the physical device per second, in blocks per second

Use the-u or-p parameter

You can refer to the introduction in troubleshooting cpu problems above.

Use the-d (- p) parameter

Device block usage (for 2.4 and later kernels)

Output item description:

Tps: the number of times per second from the physical disk. Multiple logical requests will be merged into a single disk request, and the size of a transfer is uncertain.

Rd_sec/s: the number of times sectors are read per second. The size of sector is 512b

Wr_sec/s: the number of times sectors are written per second. The size of sector is 512b

Avgrq-sz: the average data size (sector) of each device Istroke O operation.

Avgqu-sz: the average length of the disk request queue.

Await: the average elapsed time of each request, including the waiting time of the request queue, from the request disk operation to the completion of the system processing, in milliseconds (1 second = 1000 milliseconds).

Svctm: the average service time (in milliseconds) from disk I to O requests to the device

% util: the average percentage of CPU (device broadband utilization) requested by disk Iamp O to the device. When this value is close to 100%, the device is busy.

Description:

When the value of avgqu-sz is low, the device utilization is high.

When the value of% util is close to 1, the device bandwidth is full.

When the data is displayed, the device specification used is: dev Mmurn.

M is the primary device number, * * kernel (2.5 +), n is the secondary device number, but the kernel before version 2.5 has only one sequence number.

If you use the-p parameter, the device name may also be printable.

In other words, the parameter-p can print out the name of the disk device such as sda,hdc. If the parameter-p is not used, the device node may be dev8-0meme dev22-0.

Some version 2.4 kernels, avgqu-sz, await, svctm and% uti may be unavailable, shown as 0.00.

4 troubleshoot the network card traffic problem

Use the-n DEV parameter

Network statistics report

Different information is reported when the-n parameter corresponds to different keywords:

DEV keyword to report network statistics.

EDEV keyword to report failures for network devices

NFS keyword to report to the NFS client.

NFSD keyword to report to the NFS server.

SOCK keyword to report the sockets used by the user

ALL keyword to report all the above network information

Normally, we only focus on the Nic traffic, even with the-n DEV parameter

Output item description:

IFACE: the name of the network 4 networking device.

Rxerr/s: number of corrupted packets received per second

Txerr/s: the number of errors per second when sending a packet

Coll/s: the number of collisions per second when sending a packet (this is only in half-duplex mode)

Rxdrop/s: the number of network packets lost per second at the receiving end of the network device due to the full buffer

Txdrop/s: the number of network packets lost per second by the sender of the network device due to the full buffer

Txcarr/s: the number of carrier errors per second when a packet is sent

Rxfram/s: the number of frame alignment errors per second when a packet is received

Rxfifo/s: the number of buffer overflow errors per second when receiving packets

Txfifo/s: the number of buffer overflow errors per second when sending packets

The above content is how to use tools to quickly locate database problems, have you learned the knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.