In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/02 Report--
Preface
I'm sure you've all used the top command under Linux. I've been using top to check the CPU and MEM rankings of processes since I came into contact with Linux. But I don't understand any of the other outputs of the top command. What do these metrics represent, and under what circumstances do I need to pay attention? And what is the source data of the output of the top command, and what is the principle of calculation?
Demonstration environment
# uname-aLinux VM_1_11_centos 3.10.0-693.el7.x86_64 # 1 SMP Tue Aug 22 21:09:27 UTC 2017 x86 "64 GNU/Linux
Top command
Top command is a commonly used performance analysis tool under Linux, which can display the resource usage of the system in real time (the default is refreshed once in 3 seconds), as well as the resource usage of various processes, similar to the task manager of Windows.
Top-11:00:54 up 54 days, 23:35, 6 users, load average: 16.32,18.75, 21.04Tasks: 209 total, 3 running, 205 sleeping, 0 stopped, 1 zombie%Cpu (s): 29.7 us, 18.9 sy, 0.0 ni, 49.3 id, 1.7 wa, 0.0 hi, 0.4 si, 0.0 stKiB Mem: 32781216 total, 1506220 free, 6525496 used, 24749500 buff/cacheKiB Swap: 0 total, 0 free, 0 used. 25607592 avail Mem PID USER PR NI VIRT RES SHR S% CPU% MEM TIME+ COMMAND root 20 0 15.6g 461676 4704 R 198.0 1.411v 15.26 python Root 20 0 9725596 240028 4672 R 113.0 0.7 7:48.49 python root 20 0 6878028 143196 4720 S 82.4 0.4 1:35.03 python
The first line of data is equivalent to the uptime command output. 11:00:54 is the current time, up 54 days,23:55 is the time the system has been running, 6 users indicates that six users are currently logged in, and load average:16.32,18.75,21.04 represents the average load of the system for one minute, 5 minutes and 15 minutes, respectively.
Average load
The average number of active processes represented by the average load, including the number of processes in running, the number of processes preparing for running (ready), and the number of processes in uninterruptible sleep. If the average load number is exactly equal to the number of CPU cores, it proves that each core can be well utilized. If the average load number is greater than the core number to prove that the system is in a state of overload, it is generally considered to be more than 70% of the core number as a serious overload, which requires attention. It is also necessary to combine the average load of 1 minute, the average load of 5 minutes and the average load of 15 minutes to see the trend of load. If the average load of 1 minute is relatively high and the average load of 5 minutes and 15 minutes is relatively low, it means that the average load increases instantly and needs to be observed. If all three values are high, you need to pay attention to whether a process is consuming CPU crazily or has frequent IO operations, or it may be caused by too many processes running in the system and frequent process switching. For example, the above demonstration environment is an 8-core centos machine, which proves that the system is running in a state of overload for a long time.
Tasks: 214 total, 4 running, 209 sleeping, 0 stopped, 1 zombie
Total indicates that the system now has a total of 214user processes, 4 running means 4 processes are in running state, 209 sleeping means 209 processes are in sleeping state, 0 stopped means 0 processes are in stopped state, and 1 zombie means there is a zombie process.
Zombie process
When the child process ends, the parent process does not call wait () / waitpid () to wait for the child process to finish, then a zombie process will be generated. The reason is that the child process does not really exit when it ends, but leaves the data structure of a zombie process in the system process table, waiting for the parent process to clean up. If the parent process has exited, it will be handled by the init process instead of the parent process. Thus, if the parent process does not act and does not exit, there will be a large number of zombie processes, each of which will occupy a slot of the process table. If there are too many zombie processes, the system will not be able to create new processes, because the capacity of the process table is limited. So when the index of zombie is too large, it needs to attract our attention. The S column in the process details below represents the running status of the process, and Z indicates that the process is a zombie process.
Ways to eliminate zombie processes:
1. Find the parent process of the zombie process pid (pstress can show the parent-child relationship of the process), kill-9 pid. Init will automatically clean up the zombie process after the parent process exits. (it should be noted that kill-9 does not kill zombie processes.)
two。 Restart the system.
% Cpu (s): 31.9 us, 30.3 sy, 0.0 ni, 37.0 id, 0.0 wa, 0.0 hi, 0.8 si, 0.0 st
Us user represents the proportion of CPU time in user mode sy system represents the proportion of CPU time in kernel state ni nice represents the proportion of CPU time running low-priority processes id idle represents the proportion of idle CPU time wa iowait represents the proportion of CPU time waiting in IO hi hard interrupt represents the proportion of CPU time dealing with hard interrupts si soft interrupt represents the proportion of CPU time dealing with soft interrupts st steal represents the time when the current system is running in a virtual machine The percentage of CPU time occupied by other virtual machines.
So the overall CPU usage = 1-id. When the us is very high, it is proved that the CPU time is mainly consumed in the user code, and the user code needs to be optimized. When the sy is very high, it means that CPU time is consumed in the kernel, either frequent system calls or frequent CPU switching (process switching / thread switching). When the wa is very high, it indicates that there are frequent IO operations in the process, which may be disk IO or network IO. When the si is very high, it means that CPU time is spent dealing with soft interrupts, and network sending and receiving packets will trigger system soft interrupts, so a large number of network packets will cause soft interrupts to be triggered frequently, and typical SYN Floor will lead to high si.
KiB Mem: 32781216 total, 663440 free, 7354900 used, 24762876 buff/cacheKiB Swap: 0 total, 0 free, 0 used. 24771700 avail Mem
Lines 4 and 5 show the system memory usage. The unit is KiB. Totol represents total memory, free represents content that has not been used, and used is the memory already used. Buff represents memory for read-write disk caching, and cache represents memory for read-write file caching. Avail represents available application memory.
All three values of 0 indicate that the system has turned off the swap function, and since the demo environment is a virtual machine, virtual machines generally turn off the swap feature.
Line 6 begins and then shows the specific status of each process:
PID USER PR NI VIRT RES SHR S CPU MEM TIME+ COMMAND PID process username of the IDUSER process owner, such as rootPR process scheduling priority NI process nice value (priority) The lower the value represents the higher the priority virtual memory used by the VIRT process the physical memory used by the RES process (excluding shared memory) the shared memory used by the SHR process the percentage of CPU used by the CPU process the percentage of memory used by the MEM process to the total CPU time used by the COMMAND process since the start of the COMMAND process (only binary is displayed by default, top-c can display the command line and startup parameters)
Calculation principle
Before introducing the principle of calculating the indicators of the top command, it is necessary to introduce the proc file system under Linux, because the data of the top command comes from the proc file system. The proc file system is a virtual file system, which is a way of communication between the Linux kernel and the user. The Linux kernel will tell the user the current state information of the kernel through the proc file system, and the user can also set some behavior of the kernel by writing proc. Unlike ordinary files, these proc files are created and modified dynamically because the state of the kernel changes all the time.
The CPU metrics displayed by top are all derived from / proc/stat file information:
# cat / proc/stat cpu 1151829380 20277 540128095 1909004524 21051740 0 10957596 00 0cpu0 143829475 3918 67658924 235696976 5168514 0 1475030 00 0cpu1 144407338 1966 67616825 236756510 3969110 0 1392212 00 0cpu2 144531920 2287 67567520 238021699 2713175 0 1363460 00 0cpu3 143288938 2366 67474485 239715220 2223739 0 1356698 00 0cpu4 143975390 3159 67394206 239494900 1948424 0 1343261 00 0cpu5 144130685 2212 67538520 239431294 1780756 0 1349882 00 0cpu6 144009592 2175 67536945 239683876 1668203 0 1340087 00 0cpu7 143656038 2193 67340668 240204045 1579816 0 1336963 00 0
The first line represents the total CPU information, followed by the details of a CPU.
But what is the information behind these specific columns? we can find the answer through man proc:
User (1) Time spent in user mode.nice (2) Time spent in user mode with low priority (nice). System (3) Time spent in system mode.idle (4) Time spent in the idle task. This value should be USER_HZ times the second entry in the / proc/uptime pseudo-file.iowait (since Linux 2.5.41) (5) Time waiting for I to complete.irq O to complete.irq (since Linux 2.6.0-test4) (6) Time servicing interrupts.softirq (since Linux 2.6.0-test4) (7) Time servicing softirqs.steal (since Linux 2.6.11) (8) Stolen time Which is the time spent in other operating systems when running in a virtual- ized environmentguest (since Linux 2.6.24) (9) Time spent running a virtual CPU for guest operating systems under the control of the Linux kernel.guest_nice (since Linux 2.6.33) (10) Time spent running a niced guest (virtual CPU for guest operating systems under the con- trol of the Linux kernel).
That is to say, starting from the second column, there are user,nice,system,idle,iowait,irq (hard interrupt), softirq (soft interrupt), and CPU time of steal,guest,guest_nice, usually in 10ms. So how is the proportion in top calculated?
Because the CPU time is a cumulative value, we require a time period difference to reflect the current CPU situation, and the top defaults to 3s.
Where total is equal to the sum of the above, that is, total=user+nice+system+idle+iowait+irq+softirq+steal+guest+guest_nice. 3 seconds later, go to a user value user2 and a total total2.
Then the average cpu percentage of user for these 3 seconds is equal to ((user2-user1) / (total2-total1)) / 3 * 100%. In addition, each specific CPU calculation method is the same.
# cat / proc/meminfo MemTotal: 32781216 kBMemFree: 1043556 kBMemAvailable: 25108920 kBBuffers: 427516 kBCached: 22084612 kBSwapCached: 0 kBActive: 18640888 kBInactive: 10534920 kBActive (anon): 6664480 kBInactive (anon): 412 kBActive (file): 11976408 kBInactive (file): 10534508 kBUnevictable: 4 kBMlocked: 4 kBSwapTotal: 0 kBSwapFree: 0 kBDirty: 1092 kBWriteback: 0 kBAnonPages: 6663764 kBMapped: 347808 kBShmem: 1212 kBSlab: 2201292 kBSReclaimable: 1957344 kBSUnreclaim: 243948 kBKernelStack: 73392 kBPageTables: 57300 kBNFS_Unstable: 0 kBBounce: 0 kBWritebackTmp: 0 kBCommitLimit: 16390608 kBCommitted_AS: 42170784 kBVmallocTotal: 34359738367 kBVmallocUsed: 61924 kBVmallocChunk: 34359625048 kBHardwareCorrupted: 0 kBAnonHugePages: 364544 kBHugePages_Total: 0HugePages_Free: 0HugePages_Rsvd: 0HugePages_Surp: 0Hugepagesize: 2048 kBDirectMap4k: 376680 kBDirectMap2M: 26886144 kBDirectMap1G: 8388608 kB
Where total corresponds to MemTotal,free corresponds to MemFree,avail corresponds to MemAailable.
Summary
Starting from the output result of top command, this paper explains which index outliers need our attention. Finally, it introduces the cpu calculation principle of top command and the data source of mem.
Well, the above is the whole content of this article. I hope the content of this article has a certain reference and learning value for your study or work. Thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.