In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/02 Report--
Preface
Based on the author's practice in the support work of SUNING cloud platform, this paper sorts out the common problem scenarios, analysis toolbox and discrimination ideas in the operation and maintenance of cloud platform linux server (physical machine and virtual machine are distinguished in the following article). It mainly includes the following three parts:
1. Common tools, criteria and analysis ideas for abnormal performance of CPU, IO and memory in linux servers.
2. The possible causes, location methods and conventional analysis ideas of the abnormal downtime of linux server.
3. The possible causes, location methods and conventional analysis ideas of packet loss in linux server.
Audience: middle and senior linux server operation and maintenance staff
Note: combined with the problem picture, this paper enumerates the parameters and usage of each tool which are closely related to the analysis, in order to give an example to illustrate that the detailed usage of each tool needs to read the study man manual.
Linux server CPU, IO, memory performance exception cpu exception
Fig. 1 decomposition of cpu exception
Toptop-H-d 1-c highlight column and running process z x y Select shift+L/Rarrowpidstat
-d disk read and write report io statistics
-r memory usage and missing pages
-u cpu
-l display the command line and arguments
-w switch
-t displays the statistics of threads
-T
Show cpu usage of active processes per second pidstat-u 1 show cpu elapsed time by thread-group aggregation to help find busy thread pidstat-t 1-T ALLsar
-b block statistics
-B page
-r page usage statistics
-R page recycling statistics
-d disk usage statistics
-Q scheduling statistics
-S swap
-m operating frequency
-v file inode dentry activity statistics
-w scheduling switch
-W swap in and out statistics
-n Network DEV, EDEV, NFS, NFSD, SOCK, IP, EIP, ICMP, EICMP, TCP, ETCP, UDP, SOCK6, IP6, EIP6, ICMP6, EICMP6 and UDP6
-s 00:00:00-e 00:21:00 indicates the start and end time to view
Iotopiostatnmon Analysis of nmonvisulizar
Nmonvisulizar is a nmon visual analysis tool from ibm.
Sysrq turns on the switch echo 1 > / proc/sys/kernel/sysrq printing process stack echo t > / proc/sysrq-triggereg. If you have already softlockup and the business impact is obvious, use the following command to generate a vmcoreecho c > / proc/sysrq-trigger after stopping business
Strace
-c Statistics the number and time of system calls
-f is also called by the trace child process
-e indicates that you are interested in calling eg. -e open,write
Eg.
Suspend when the command is executed, know which syscallstrace cmd arg process the process is hung in, and get the syscall statistics strace-p PID-c
Gdb
Bt View execution Stack
Frame switch work Fram
The cpu consumption of the user process affects the overall use of the system. With debuginfo and code, the occupancy logic can be roughly sorted out. After attach, the process will STOPgdb-p PID
Perf
Online sampling and display of perf top
-e indicates the event. The default is cycle. It can be queried by perf list all the time.
-G call graph
-F sampling frequency
-d refresh interval
-p specific process
-C specific kernel
Perf top-D1-G-F 99-zshift + e expandable stack view shift + c collapsible stack view
Perf record/report
Record output sample file perf.data file
Report parsing
Perf record-F 99-a-g-p PID-C 6 sleep 5perf report
Memory exception
Figure 2 abnormal decomposition of memory
Generally check freecat / proc/self/statuscat / proc/self/smapsnumastat-mnumactl-- hardwarecat / proc/meminfo
Zoneinfocat / proc/zoneinfo | egrep "zone | min | low | high | free | present".... Node 0, zone Normal pages free 3195167 min 13740 low 17175 high 20610 present 3361280 nr_free_pages 3195167 high: 186high: 186
Three waterlines
The combined values of sysctl-a | grep extra_free_kbytes min_free_kbytes extra_free_kbytes form three waterlines
Direct recovery line MIN min_free_kbytes
Background Recycle Line LOW 5/4*min_free_kbytes + extra_free_kbytes
Background Recycling stops HIGH 3/2*min_free_kbytes + extra_free_kbytes
Physical page conditions cat / proc/buddyinfoNode 0, zone DMA 2 21 1 10 10 1 3Node 0, zone DMA32 730 596 414 339 277 214 159 127 85 68 557Node 0, zone Normal 447 558 348 166 72 45 1021 888,607 252 2661
Kernel structure buffering slabtop to understand the current kernel data structure memory consumption
Io exception
Fig. 3 decomposition of io exception
Io scheduler
Cfq deadline noop
Blktrace & blkparser
When unexpected io delays occur, you need to have an in-depth understanding of io delay distribution and use blktrace & blkparser tools for detailed analysis.
Dd
Learn to properly use the oflag logo sync synchronously flush out data direct bypass pagecache
Fio
A convenient tool for calibrating the io capability of a system
Fio-filename=/dev/mapper/vg_os-testlv-direct=1-iodepth 1-thread-rw=randwrite-ioengine=psync-bs=8k-size=100G-numjobs=96-runtime=60-group_reporting-name=mytest
Du & df
Query Analysis for Block occupation and File system occupation
Strace can see the difference between the principles of the two commands: df reads file system information, du stat each file and then accumulates
The big difference between the two needs to be further investigated: is there a hole? Is it true that a file user can no longer see it but the file system has not really been deleted? (that is, when the open file is deleted, lsof + L1) is the previous directory file hidden by some mount point? If df displays an error, is it suspected that the fs is damaged? Network abnormal scenario
Fig. 4 Network anomaly analysis
Ethtoolethtool-S pays attention to drop error
Tc Statistics check tc-s-d qd concern package drop situation ss netstat iftop frequently used connections View netstat-ntpnetstat-ntplss-ietcpdump
-I the name of the net port to be crawled
-w grab the package file, which can be a time format string
-G rollback duration (in second)
-c grab how many packages and then exit
-s grab part of the message, in bytes
-r offline analysis of reading grab package files
-z call gzip and other tools to do compression
-Z switches user operation. Default is tcpdump.
-B set the buf size, otherwise you will not be able to catch the whole unit KB 10240
Eg.tcpdump tcp port 80 and host
Tcpdump-s 0-w% m_%d_%H_%M_%S.pcap-G 5-z gzip-Z root-c 100000-I any
Analysis of downtime scenario
Fig. 5 Analysis of outage scenario
Dropwatchcrash tool
Log View downtime Association Log
Bt to view the location of downtime
Sys to view basic information
Crash vmcore vmlinuxvmlinux comes from the kernel debuginfo package and is a binary kernel image with debugging information. if the system does not generate vmcore correctly, you need to check the / etc/kdump.conf configuration and the setting vmcore path in it. Kernel state issues have been discussed here, and the common exception analysis field is no longer summarized.
This paper summarizes several common linux server anomaly analysis ideas and toolset on cloud platform, but as mentioned at the beginning, the real fast and effective problem identification and location can not be separated from the familiarity and meticulous judgment of the system field, and make flexible use of the toolbox in the way of scene, so as to understand the system from surface to inside, from shallow to deep, and solve online problems quickly and efficiently. Havefun:)
About the author
Xie Yinghao SUNING Technology Group Cloud platform Center Senior engineer, long-term hard work in the support field of linux kernel and operating system, to ensure the stable and efficient operation of SUNING cloud environment line Shanghai quantity kvm server farm.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.