Linux performance tools that programmers must not know 04/26 Update SLTechnology News&Howtos

Linux performance tools that programmers must not know

2025-04-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Preface

In actual development, we sometimes receive monitoring alarms from some services, such as CPU soaring, memory soaring and so on. At this time, we will log in to the server for troubleshooting. This blog will cover this knowledge: Linux performance tools.

A simulation of online problem troubleshooting

Background: CPU suddenly skyrocketed after the service was running smoothly for a period of time.

With the top command, you can confirm which process caused the CPU to soar (perhaps a false alarm). ).

You can see that the PID is 2816 process in the figure, and the CPU utilization is very high.

Use top-Hp 2816 to observe the threads under the process. As you can see in the figure, the thread 2825 has a very high CPU.

Here the use of Python is very convenient to convert the decimal thread ID to hexadecimal, why do you do so?

Because the hexadecimal NID is used in the next thread DUMP file.

In practice, we should use jstack pid to DUMP more than a few times, because there is a state transition in the thread, so multiple DUMP is helpful to grab more information from the thread.

In the figure, you can observe that one thread gets the lock and is running without releasing it, while the other thread has been waiting for the lock. At this point, you can go to the code to analyze why the lock is delayed.

Detailed explanation of performance Monitoring tool top

In the above case, top is used, but in practice, the amount of information of top is very large, which is analyzed in detail here.

First line:

Two times are involved, one is the system time, and the other is the running time of the machine. We should focus on the running time of the machine, Why? Sometimes, rebooting the machine can cause a lot of problems, you know! ]

How many users are logged in to the system? [more information can be found through who/w/history]

What do three load values mean?

It represents the load of 1minminmin5minminmin15min machine respectively. How to determine the load? Need to be combined with the core number of CPU, for example, the machine is a 4-core CPU, then if the load value is more than 4, it means a heavy load! [press 1 under top to observe the number of CPU]

The above information can also be obtained through the uptime command.

The second line:

It is mainly about the total number of tasks, and the focus should be on the number of tasks in the zombie state.

The third line:

Mainly some information about CPU.

US/SY, which refers to the proportion of user processes and system processes using CPU.

NI, or NICE, represents the percentage of processes whose priorities have been adjusted, which should not be very large in normal terms.

ID means idle; WA indicates the waiting time of resources. For example, if the service logs a lot under instantaneous heavy traffic, then this value will soar because it will consume a lot of resources.

HI, hard interrupts, are usually caused by peripherals. If the HI is skyrocketing, it means that there is something wrong with the peripherals at the hardware level. SI stands for soft interrupt.

ST, that is, steel, if the host is virtual, there will be this ST information, that is, the percentage of time slices that the virtual machine obtains CPU from the host.

Lines 4 and 5:

Here are two conceptual things: buffer and cache.

What is the main point of buffer? It should be the data to be processed, mainly to deal with the speed mismatch between the two systems. Cache, on the other hand, should generally be a cache of result data, such as loading some information from DB for query.

SWAP partition, is to use the hard disk to do part of the cache, if the SWAP exchange is very frequent, that is to say, there is not enough memory!

List description:

PID process ID, USER users, PR priority, VIRT virtual memory, RES resident memory, SHR shared memory

It is important to point out that RES represents the actual memory consumed by the process, not the amount of memory requested. That is, the physical amount of memory consumed by the current process is RES-SHR.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.