Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to analyze the introduction and use of perf

2025-01-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

In this issue, the editor will bring you an introduction and use of the analysis tool perf. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.

Test environment: Ubuntu16.04 + Kernel:4.4.0-31

Apt-get install linux-source

Cd / usr/src/tools/perf

Make & & make install

RHEL installation method:

Yum-y install perf.x86_64

System-level performance optimization usually consists of two phases: performance profiling (performance profiling) and code optimization.

The goal of performance analysis is to find performance bottlenecks, to find the causes of performance problems and hot codes.

The goal of code optimization is to optimize code or compilation options for specific performance problems in order to improve software performance.

In the performance analysis stage, we need to rely on the existing profiling tools, such as perf and so on. In the code optimization stage, it is often necessary to use the experience of developers to write concise and efficient code, and even make rational use of various instructions at the assembly level and arrange the execution order of various instructions reasonably.

Perf is a Linux performance analysis tool. Linux performance counter is a new kernel-based subsystem, which provides a performance analysis framework, such as hardware (CPU, PMU (Performance Monitoring Unit)) functions and software (software counters, tracepoint) functions.

With perf, applications can take advantage of PMU, tracepoint, and counters in the kernel for performance statistics. It can not only analyze and develop application performance problems (per thread), but also can be used to analyze kernel performance problems, of course, colleagues can also analyze applications and kernels, so as to fully understand the performance bottlenecks in applications.

Using perf, you can analyze hardware events that occur while the program is running, such as instructions retired, processor clock cycles, and so on, and you can also analyze software time, such as page fault and process switching.

Perf is a comprehensive analysis tool, ranging from the overall performance of the system to the thread level of the process, and even to the function and assembly level.

Perf provides 18 kinds of weapons, which can be broken into pieces with a big knife or a scalpel for detailed analysis.

1. Background knowledge 1.1 tracepoints

Tracepoints are hook scattered in kernel source code that can be triggered when a particular code is executed, which can be used by various trace/debug tools.

Perf records the time when tracepoint is generated and generates reports. By analyzing these reports, you can understand the details of the kernel while the program is running and make an accurate diagnosis of performance symptoms.

The corresponding sysfs nodes for these tracepint are in the / sys/kernel/debug/tracing/events directory.

1.2 cache of hardware features

Memory read and write is very fast, but still can not be compared with the processor instruction execution speed. In order to read instructions and data from memory, the processor needs to wait, which is very long in terms of processor time. Cache is a kind of SRAM that reads and writes very fast and can match the processor. So by saving commonly used data in cache, the processor does not have to wait, thus improving performance. The size of cache is generally very small, and making full use of cache is a very important part of software tuning.

two。 Main concerns

Based on performance analysis, algorithm optimization (tradeoff between space complexity and time complexity) and code optimization (improve execution speed and reduce memory footprint) can be carried out.

Evaluate the use of hardware resources by the program, such as the number of visits to all levels of cache, the number of cache losses at all levels, the pipeline pause cycle, the number of front-end bus visits, and so on.

Evaluate the program's use of operating system resources, the number of system calls, the number of context switches, and the number of task migrations.

There are three types of events:

Hardware Event is generated by PMU components to detect whether and how many performance events occur under certain conditions. Like cache hit.

Software Event is the events generated by the kernel, distributed in various functional modules, statistics and operating system-related performance events. Such as process switching, tick number and so on.

Tracepoint Event is an event triggered by a static tracepoint in the kernel. These tracepoint are used to determine the details of the kernel's behavior while the program is running, such as the number of times the slab allocator is allocated.

3. The use of perf

After perf-- help, you can see the secondary command of perf.

The ordinal command uses 1annotate to parse the perf.data file generated by perf record to display the commented code. 2archive packages all sampled elf files according to the build-id recorded by the data file. With this compressed package, the sampled data recorded in the data file can be analyzed on any machine. Benchmark, built into 3benchperf, currently includes two sets of benchmark for schedulers and memory management subsystems. 4buildid-cache manages perf's buildid cache, and each elf file has a unique buildid. Buildid is used by perf to associate performance data with elf files. 5buildid-list lists all buildid recorded in the data file. 6diff compares the differences between the two data files. The specific differences of each symbol (function) in hot spot analysis can be given. 7evlist lists all performance events in the data file perf.data. 8inject this tool reads the event stream recorded by the perf record tool and directs it to standard output. At any point in the parsed code, other events can be injected into the event stream. 9kmem 10kvm, a tool that tracks the kernel memory (slab) subsystem, is used to track the Guest OS on which the test runs on the KVM virtual machine. 11list lists all performance events supported by the current system. Includes hardware performance events, software performance events, and checkpoints. 12lock analyzes lock information in the kernel, including lock contention, wait delay, and so on. 13mem memory access 14record collects sampling information and records it in a data file. The data file can then be analyzed by other tools. 15report reads the data file created by perf record and gives the results of hot spot analysis. 16sched is an analysis tool for the scheduler subsystem. 17script executes functional extension scripts written by perl or python, generates script frameworks, reads data information in data files, and so on. 18stat executes a command to collect performance profiles of specific processes, including CPI, Cache loss rates, and so on. 19testperf carries on the soundness test to the current software and hardware platform, and can use this tool to test whether the current software and hardware platform can support all the functions of perf. 21top, a tool used by 20timechart to visualize the system behavior during testing, is similar to linux's top command to analyze the system performance in real time. 22trace tools for syscall. 23probe is used to define dynamic checkpoints.

Overall overview:

Perf list to view performance events supported by the current system

Perf bench makes a survey of the system performance.

Perf test tests the soundness of the system.

Perf stat performs statistics on global performance

Global details:

Perf top can view the current system process function occupancy in real time.

Perf probe can customize dynamic events

Specific feature analysis:

Performance Analysis of perf kmem for slab Subsystem

Perf kvm Virtualization Analysis for kvm

Perf lock Analysis Lock performance

Perf mem Analysis of memory slab performance

Perf sched analyzes the performance of kernel scheduler

Perf trace records the trajectory of system calls

The most commonly used function perf record, can be the overall system, can also be specific to a certain process, even more specific to a certain process a certain event; can be macro, can also be very micro.

Pref record records information to perf.data

Perf report generates a report

Perf diff diff two records

Perf evlist lists logged performance events

Perf annotate displays perf.data function code

Perf archive packages related symbols to facilitate analysis on other machines.

Perf script outputs readable text from perf.data

Visualization tool perf timechart

Perf timechart record records events

Perf timechart generates output.svg documents

Overhead introduced by 3.0 perf

Perf testing inevitably introduces additional load in three forms:

Counting: the kernel provides count summaries, mostly Hardware Event, Software Events, PMU counts, and so on. The related command perf stat.

Sampling:perf caches the event data into a piece of buffer and then writes it asynchronously to the perf.data file. Use tools such as perf report for offline analysis.

New features in bpf:Kernel 4.4 + to provide more effective filter and output summaries.

The extra load introduced by counting is minimal; sampling will introduce a very large load in some cases; bpf can effectively reduce the load.

For sampling, we can effectively reduce the load caused by reading and writing Imax O by hanging the file system built on RAM.

Mkdir / tmpfs

Mount-t tmpfs tmpfs / tmpfs

3.1 perf list

Perf list does not fully display all supported event types and requires sudo perf list.

It can also display the perf events supported by a particular module: hw/cache/pmu is hardware-dependent; tracepoint kernel-based ftrace;sw is actually kernel counters.

Hw/hardware shows supported hardware event related, such as:

Al@al-System-Product-Name:~/perf$ sudo perf list hardware

List of pre-defined events (to be used in-e):

Branch-instructions OR branches [Hardware event]

Branch-misses [Hardware event]

Cache-misses [Hardware event]

Cache-references [Hardware event]

Cpu-cycles OR cycles [Hardware event]

Instructions [Hardware event]

Stalled-cycles-backend OR idle-cycles-backend [Hardware event]

Stalled-cycles-frontend OR idle-cycles-frontend [Hardware event]

Sw/software displays a list of supported software events:

Al@al-System-Product-Name:~/perf$ sudo perf list sw

List of pre-defined events (to be used in-e):

Alignment-faults [Software event]

Bpf-output [Software event]

Context-switches OR cs [Software event]

Cpu-clock [Software event]

Cpu-migrations OR migrations [Software event]

Dummy [Software event]

Emulation-faults [Software event]

Major-faults [Software event]

Minor-faults [Software event]

Page-faults OR faults [Software event]

Task-clock [Software event]

Cache/hwcache displays a list of hardware cache related events:

Al@al-System-Product-Name:~/perf$ sudo perf list cache

List of pre-defined events (to be used in-e):

L1-dcache-load-misses [Hardware cache event]

L1-dcache-loads [Hardware cache event]

L1-dcache-prefetch-misses [Hardware cache event]

L1-dcache-prefetches [Hardware cache event]

L1-icache-load-misses [Hardware cache event]

L1-icache-loads [Hardware cache event]

L1-icache-prefetches [Hardware cache event]

LLC-load-misses [Hardware cache event]

LLC-loads [Hardware cache event]

LLC-stores [Hardware cache event]

Branch-load-misses [Hardware cache event]

Branch-loads [Hardware cache event]

DTLB-load-misses [Hardware cache event]

DTLB-loads [Hardware cache event]

ITLB-load-misses [Hardware cache event]

ITLB-loads [Hardware cache event]

Node-load-misses [Hardware cache event]

Node-loads [Hardware cache event]

Pmu displays a list of supported PMU events:

Al@al-System-Product-Name:~/perf$ sudo perf list pmu

List of pre-defined events (to be used in-e):

Branch-instructions OR cpu/branch-instructions/ [Kernel PMU event]

Branch-misses OR cpu/branch-misses/ [Kernel PMU event]

Cache-misses OR cpu/cache-misses/ [Kernel PMU event]

Cache-references OR cpu/cache-references/ [Kernel PMU event]

Cpu-cycles OR cpu/cpu-cycles/ [Kernel PMU event]

Instructions OR cpu/instructions/ [Kernel PMU event]

Msr/aperf/ [Kernel PMU event]

Msr/mperf/ [Kernel PMU event]

Msr/tsc/ [Kernel PMU event]

Stalled-cycles-backend OR cpu/stalled-cycles-backend/ [Kernel PMU event]

Stalled-cycles-frontend OR cpu/stalled-cycles-frontend/ [Kernel PMU event]

Tracepoint displays a list of all supported tracepoint, which is relatively large:

Al@al-System-Product-Name:~/perf$ sudo perf list tracepoint

List of pre-defined events (to be used in-e):

Alarmtimer:alarmtimer_cancel [Tracepoint event]

Alarmtimer:alarmtimer_fired [Tracepoint event]

Alarmtimer:alarmtimer_start [Tracepoint event]

Alarmtimer:alarmtimer_suspend [Tracepoint event]

Block:block_bio_backmerge [Tracepoint event]

Block:block_bio_bounce [Tracepoint event]

Block:block_bio_complete [Tracepoint event]

Block:block_bio_frontmerge [Tracepoint event]

Block:block_bio_queue [Tracepoint event]

...

3.2 perf top

By default, perf top cannot display information and requires sudo perf top or echo-1 > / proc/sys/kernel/perf_event_paranoid (on Ubuntu16.04, echo 0 > / proc/sys/kernel/kptr_restrict).

The perf top can be displayed normally as follows:

The first column: the percentage of performance events raised by symbols, which refers to the percentage of cpu cycles occupied.

The second column: the DSO (Dynamic Shared Object) where the symbol is located, which can be applications, kernels, dynamic link libraries, modules.

The third column: the type of DSO. [.] Indicates that the symbol belongs to a user-mode ELF file, including an executable file and a dynamic link library; [k] indicates that the symbol belongs to the kernel or module.

The fourth column: symbolic name. Some symbols cannot be resolved to function names and can only be represented by addresses.

The common commands for the perf top interface are as follows:

H: show help, you can display detailed help information.

UP/DOWN/PGUP/PGDN/SPACE: up and down and turn the page.

A:annotate current symbol, annotate the current symbol. It can give the notes of assembly language and the sampling rate of each instruction.

D: filter out all symbols that do not belong to this DSO. It is very convenient to view symbols of the same category.

P: save the current information to perf.hist.N.

The common options for perf top are:

-e: indicates the performance event to analyze.

-p: Profile events on existing Process ID (comma sperated list). Analyze only the target process and the threads it creates.

-k: Path to vmlinux. Required for annotation functionality. The path to the kernel image with a symbolic table.

-K: symbols that belong to kernels or modules are not displayed.

-U: symbols that belong to user-mode programs are not displayed.

-d: the refresh period of the interface. The default is 2s, because perf top reads performance data from the memory area of mmap by default every 2s.

-g: get the call diagram of the function.

Perf top-- call-graph [fractal]. The path probability is a relative value, which adds up to 100%. The call order is from bottom to top.

Perf top-- call-graph graph, the path probability is an absolute value, which adds up to the heat of the function.

3.3 perf stat

Perf stat is used to run instructions and analyze their statistical results. Although perf top can also specify pid, you must start the application before you can view the information.

Perf stat can fully count the information of the whole life cycle of the application.

The format of the command is:

Perf stat [- e |-- event=EVENT] [- a]

Perf stat [- e |-- event=EVENT] [- a]-[]

Let's take a brief look at the output of perf stat:

Al@al-System-Product-Name:~/perf$ sudo perf stat

^ C

Performance counter stats for 'system wide':

40904.820871 cpu-clock (msec) # 5.000 CPUs utilized

18132 context-switches # 0.443 K/sec

1053 cpu-migrations # 0.026 K/sec

2420 page-faults # 0.059 K/sec

3958376712 cycles # 0.097 GHz (49.99%)

574598403 stalled-cycles-frontend # 14.52% frontend cycles idle (49.98%)

9392982910 stalled-cycles-backend # 237.29% backend cycles idle (50.00%)

1653185883 instructions # 0.42 insn per cycle

# 5.68 stalled cycles per insn (50.01%)

237061366 branches # 5.795 M/sec (50.02%)

18333168 branch-misses # 7.73% of all branches (50.00%)

8.181521203 seconds time elapsed

The output is explained as follows:

Cpu-clock: the actual processor time taken by the task, in ms. CPUs utilized = occupancy rate of task-clock / time elapsed,CPU.

Context-switches: the number of times the context of a program changes while it is running.

CPU-migrations: the number of processor migrations that occur while the program is running. In order to maintain load balancing among multiple processors, Linux migrates a task from one CPU to another CPU under certain conditions.

CPU migration and context switching: CPU migration does not necessarily occur when context switching occurs, but context switching is sure to occur when CPU migration occurs. When a context switch occurs, it is possible to simply swap the context out of the current CPU, and the next time the scheduler schedules the process to execute on this CPU.

Page-faults: the number of page fault exceptions. A page fault exception will be triggered when the page requested by the application has not been created, the requested page is not in memory, or although the requested page is in memory, the mapping between the physical address and the virtual address has not been established. In addition, TLB misses, page access rights mismatch and other situations will also trigger a page fault exception.

Cycles: the number of processor cycles consumed. If the cpu cycles used by ls is regarded as a processor, then its main frequency is 2.486GHz. It can be calculated with cycles / task-clock.

Stalled-cycles-frontend: the quality step of instruction reading or decoding that fails to perform in parallel in the desired state, resulting in a stagnant clock cycle.

Stalled-cycles-backend: instruction execution step, a stagnant clock cycle occurs.

Instructions: how many instructions were executed. The average number of instructions executed by IPC for each cpu cycle.

Branches: the number of branch instructions encountered. Branch-misses is the number of branch instructions that are incorrectly predicted.

Other commonly used parameters

-a,-- all-cpus displays statistics on all CPU

-C,-- cpu displays statistics for the specified CPU

-c,-- scale scale/normalize counters

-D,-- delay ms to wait before starting measurement after program start

-d,-- detailed detailed run-start a lot of events

-e,-- event event selector. Use 'perf list' to list available events

-G,-- cgroup monitor event in cgroup name only

-g,-- group put the counters into a counter group

-I-- interval-print

Print counts at regular interval in ms (> = 10)

-I-- no-inherit child tasks do not inherit counters

-n,-- null null run-dont start any counters

-o,-- output outputs statistics to a file

-p,-- pid stat events on existing process id

-r,-- repeat repeat command and print average + stddev (max: 100, forever: 0)

-S,-- sync call sync () before starting a run

-t,-- tid stat events on existing thread id

...

Example

For the previous example of a statistical program, let's take a look at an example of statistical CPU information:

Execute sudo perf stat-C 0 and count the information of CPU 0. After you want to stop, press Ctrl+C to stop. You can see that the statistical items are the same, except that the statistical objects have changed.

Al@al-System-Product-Name:~/perf$ sudo perf stat-C 0

^ C

Performance counter stats for 'CPU (s) 0:

2517.107315 cpu-clock (msec) # 1.000 CPUs utilized

2941 context-switches # 0.001 M/sec

109 cpu-migrations # 0.043 K/sec

38 page-faults # 0.015 K/sec

644094340 cycles # 0.256 GHz (49.94%)

70425076 stalled-cycles-frontend # 10.93% frontend cycles idle (49.94%)

965270543 stalled-cycles-backend # 149.86% backend cycles idle (49.94%)

623284864 instructions # 0.97 insn per cycle

# 1.55 stalled cycles per insn (50.06%)

65658190 branches # 26.085 M/sec (50.06%)

3276104 branch-misses # 4.99% of all branches (50.06%)

2.516996126 seconds time elapsed

If you need to count more items, you need to use-e, such as:

Perf stat-e task-clock,context-switches,cpu-migrations,page-faults,cycles,stalled-cycles-frontend,stalled-cycles-backend,instructions,branches,branch-misses,L1-dcache-loads,L1-dcache-load-misses,LLC-loads,LLC-load-misses,dTLB-loads,dTLB-load-misses ls

The results are as follows, and the special items of concern are also included in the statistics.

Al@al-System-Product-Name:~/perf$ sudo perf stat-e task-clock,context-switches,cpu-migrations,page-faults,cycles,stalled-cycles-frontend,stalled-cycles-backend,instructions,branches,branch-misses,L1-dcache-loads,L1-dcache-load-misses,LLC-loads,LLC-load-misses,dTLB-loads,dTLB-load-misses ls

Performance counter stats for 'ls':

2.319422 task-clock (msec) # 0.719 CPUs utilized

0 context-switches # 0.000 K/sec

0 cpu-migrations # 0.000 K/sec

89 page-faults # 0.038 M/sec

2142386 cycles # 0.924 GHz

659800 stalled-cycles-frontend # 30.80% frontend cycles idle

725343 stalled-cycles-backend # 33.86% backend cycles idle

1344518 instructions # 0.63 insn per cycle

# 0.54 stalled cycles per insn

Branches

Branch-misses

L1-dcache-loads

L1-dcache-load-misses

LLC-loads

LLC-load-misses

DTLB-loads

DTLB-load-misses

0.003227507 seconds time elapsed

3.4 perf bench

Perf bench, as a general framework for benchmark tools, includes subsystems such as sched/mem/numa/futex, all of which can be specified by all.

Perf bench can be used to evaluate specific performance such as system sched/mem.

Perf bench sched: scheduler and IPC mechanism. Includes two functions: messaging and pipe.

Perf bench mem: memory access performance. Includes two functions: memcpy and memset.

Scheduling and memory processing performance of the perf bench numa:NUMA architecture. Contains the mem function.

Perf bench futex:futex stress test. Contains the hash/wake/wake-parallel/requeue/lock-pi function.

Perf bench all: a collection of all bench tests

3.4.1 perf bench sched all

Test the performance of messaging and pipi.

3.4.1.1 sched messaging evaluates process scheduling and inter-core communication

Sched message is ported from the classic test program hackbench to measure the performance, overhead and scalability of the scheduler.

The benchmark starts N reader/sender processes or thread pairs and reads and writes concurrently through IPC (socket or pipe). In general, N is increasing to measure the scalability of the scheduler.

The usage and use of sched message is the same as that of hackbench, which can be tested for different purposes by modifying parameters:

-g,-- group Specify number of groups

-l,-- nr_loops Specify the number of loops to run (default: 100)

-p,-- pipe Use pipe () instead of socketpair ()

-t,-- thread Be multi thread instead of multi process

Test results:

Al@al-System-Product-Name:~/perf$ perf bench sched all

# Running sched/messaging benchmark...

# 20 sender and receiver processes per group

# 10 groups = = 400 processes run

Total time: 0.173 [sec]

# Running sched/pipe benchmark...

# Executed 1000000 pipe operations between two processes

Total time: 12.233 [sec]

12.233170 usecs/op

81744 ops/sec

The impact of using pipe () and socketpair () on testing:

1. Perf bench sched messaging

# Running 'sched/messaging' benchmark:

# 20 sender and receiver processes per group

# 10 groups = = 400 processes run

Total time: 0.176 [sec]

2. Perf bench sched messaging-p

# Running 'sched/messaging' benchmark:

# 20 sender and receiver processes per group

# 10 groups = = 400 processes run

Total time: 0.093 [sec]

It can be seen that the performance of socketpair () is significantly lower than pipe ().

3.4.1.2 sched pipe evaluates pipe performance

Sched pipe is migrated from Ingo Molnar's pipe-test-1m.c. The original program of Ingo was designed to test the performance and fairness of different schedulers.

Its working principle is very simple, two processes desperately send 1000000 integers to each other through pipe, process A sends B, and B sends A. Because An and B depend on each other, if the scheduler is unfair and is better for A than B, then An and B as a whole will take longer.

Al@al-System-Product-Name:~/perf$ perf bench sched pipe

# Running 'sched/pipe' benchmark:

# Executed 1000000 pipe operations between two processes

Total time: 12.240 [sec]

12.240411 usecs/op

81696 ops/sec

3.4.2 perf bench mem all

This test measures the time it takes for different versions of the memcpy/memset/ function to process a 1m data, which translates into throughput.

Al@al-System-Product-Name:~/perf$ perf bench mem all

# Running mem/memcpy benchmark...

# function 'default' (Default memcpy () provided by glibc)

# Copying 1MB bytes...

1.236155 GB/sec.

..

3.4.3 perf bench futex

Futex is a mixed mechanism of user mode and kernel mode, so it needs to be completed by the cooperation of two parts. Sys_futex system call is provided on linux to support synchronous processing in the case of process competition.

All futex synchronization operations should start in user space by creating a futex synchronization variable, that is, an integer counter located in shared memory.

When the process tries to hold the lock or enter the mutex, perform a "down" operation on futex, that is, atomically subtract 1 from the futex synchronization variable. If the synchronous variable changes to 0, no contention occurs and the process executes as usual.

If the synchronization variable is negative, it means that a race occurs and the futex_wait operation called by the futex system call is called to hibernate the current process.

When the process releases the lock or leaves the mutex, a "up" operation is performed on the futex, that is, atomically adding 1 to the futex synchronization variable. If the synchronization variable changes from 0 to 1, no contention occurs and the process executes as usual.

If the synchronization variable is negative before addition, it means that a race occurs and one or more waiting processes need to be awakened by calling the futex_wake operation of the futex system call.

Al@al-System-Product-Name:~/perf$ perf bench futex all

# Running futex/hash benchmark...

Run summary [PID 3806]: 5 threads, each operating on 1024 [private] futexes for 10 secs.

[thread 0] futexes: 0x4003d20... 0x4004d1c [4635648 ops/sec]

[thread 1] futexes: 0x4004d30... 0x4005d2c [4611072 ops/sec]

[thread 2] futexes: 0x4005e70... 0x4006e6c [4254515 ops/sec]

[thread 3] futexes: 0x4006fb0... 0x4007fac [4559360 ops/sec]

[thread 4] futexes: 0x40080f0... 0x40090ec [4636262 ops/sec]

Averaged 4539371 operations/sec (+-1.60%), total secs = 10

# Running futex/wake benchmark...

Run summary [PID 3806]: blocking on 5 threads (at [private] futex 0x96b52c), waking up 1 at a time.

[Run 1]: Wokeup 5 of 5 threads in 0.0270 ms

[Run 2]: Wokeup 5 of 5 threads in 0.0370 ms

...

3.4 perf record

Run a command and save its data to perf.data. You can then use perf report for analysis.

Perf record and perf report can analyze an application more accurately, and perf record can be accurate to the function level. And mix the assembly language and code in the function.

Create a fork.c file for testing:

# include void test_little (void) {int iJournal j; for (I = 0; I

< 30000000; i++) j=i; }void test_mdedium(void){ int i,j; for(i = 0; i < 60000000; i++) j=i; }void test_high(void){ int i,j; for(i = 0; i < 90000000; i++) j=i; }void test_hi(void){ int i,j; for(i = 0; i < 120000000; i++) j=i; }int main(void){ int i, pid, result; for(i = 0; i0) printf("i=%d parent parent=%d current=%d child=%d\n", i, getppid(), getpid(), result); else printf("i=%d child parent=%d current=%d\n", i, getppid(), getpid()); if(i==0) { test_little(); sleep(1); } else { test_mdedium(); sleep(1); } } pid = wait(NULL); test_high(); printf("pid=%d wait=%d\n", getpid(), pid); sleep(1); pid = wait(NULL); test_hi(); printf("pid=%d wait=%d\n", getpid(), pid); return 0;}

Compile the fork.c file gcc fork.c-o fork-g-O0, and you can use this method to analyze the results of whether or not to choose compilation optimization. -g is only available for callgraph function,-O0 is off for optimization.

Common option

-e record specifies the PMU event

-- filter event event filter

-an event of admitting all CPU

-p enroll the event of the specified pid process

-o specify the file name of the data to be admitted and saved

-g enable function to call graph function

-C enroll the event of the specified CPU

Sudo perf record-a-g. / fork: generates a perf.data file in the current directory.

Sudo perf report-- call-graph none results are as follows, followed by perf timechart analysis.

The image above looks messy. If you want to see only the information generated by fork:

Sudo perf report-- call-graph none-c fork

You can see that only the relevant symbols of the fork program and their occupancy are shown.

3.5 perf report

Analyze the data generated by perf record, and give the analysis results.

Common parameters:

-I the name of the imported data file. If not, it defaults to perf.data.

-g generates a function call diagram. In this case, the kernel needs signed information (not stripped) to open the CONFIG_KALLSYMS; user space library or execute files, and the compilation option needs to be added with-g.

Sort displays classified statistics at a higher level, such as pid, comm, dso, symbol, parent, cpu,socket, srcline, weight, local_weight.

By performing sudo perf report-I perf.data, you can see the percentage of the main function and the respective percentages of funcA and funcB.

During the execution of funcB, apic timer is also generated, which takes up part of the cpu resources. Besides, the proportion is basically 1:10.

The proportion of funcA and funcB is basically in line with expectations. So go to longa and analyze the hot spots.

In the mixed display interface of C and assembly, it can be seen that the for loop occupies 69.92% of the Magi Junci assignment and takes up 30.08%.

According to the above description, we can see that top is suitable for monitoring the performance of the whole system, stat is more suitable for the performance analysis of a single program, and record/report is more suitable for more fine-grained analysis of the program.

Note:

When using perf report-g, you may prompt Failed to open / lib/libpthread-0.9.33.2.so, continuing without symbols.

At this time, check through file xxx, if you prompt xxxx stripped to indicate that this file does not contain symbolic information, you need a xxxx not stripped file.

3.6 perf timechart

Perf timechart is a tool for graphing previous statistical information.

Perf timechart record is used to record events of the entire system or an application, and you can also add option to record events of a specified type.

Perf timechart is used to convert perf.data to SVG format, and SVG can be opened through Inkscape or browser.

Perf timechart record can specify specific types of events:

-P: record power related events

-T: record task-related events

-I: record io related events

-g: record the relationship between function calls

Perf timechart is used to convert perf.data admitted by perf timechart record into output.svg.

-w adjust the length of the output svg file to see more details.

-p can be specified to view only the output of certain processes, using sudo perf timechart-p test1-p thermald

-o specify the output file name

-I specify the file name to be resolved

-w output SVG file width

-P displays only power related event icons

-T,-- tasks-only displays task information, not processor information

-p display specified process name or PID display

-- symfs= specifies the system symbol table path

-t,-- topology classifies CPU according to its topology

-- highlight= highlights task that have been running for more than a certain time

When too many threads affect the speed of svg parsing, you can use-p to specify specific threads for analysis. If several threads are required, each thread uses-p xxx.

Sudo perf timechart record-T. / fork & & sudo perf timechart-p fork

As a result, you can see the name of the relevant task, the start / end time, and the status of each point in time (Running/Idle/Deeper Idle/Deepest Idle/Sleeping/Waiting for Cpu / Blocked on IO).

3.6.1 Analysis function proportion combined with perf timechart and perf report

According to perf report, the proportions of test_little, test_medium, test_high and test_hi are 3.84%, 12.01%, 22.99% and 30.43%, respectively.

There is a code to know that if test_little is 1 unit, then test_medium is 2 units, test_high is 3 units, and test_hi is 4 units.

The execution times of the four functions are 2, 4, 4 and 4, respectively, so the CPU percentage of each unit of the four functions is:

Test_little-3.84% Universe 2.9%

Test_medium-12.01% Universe 4 Universe 2% 1.5%

Test_high-22.99Universe 4 Universe 3.99 1.9%

Test_hi-30.43% Compact 4Compact 4mm 1.9%

Basically in line with expectations.

Log IO events, and you can see the Disk/Network/Sync/Poll/Error information by application. And data throughput per application.

Sudo perf timechart record-I & & sudo perf timechart-w 1800.

Recording Power status events, you can see that the difference is that there is a breakdown of C/C2 to show the Power status in more detail in states such as Idle.

Sudo perf timechart record-I & & sudo perf timechart-w 1800

3.7 perf script

Used to read naked trace data saved by perf record.

How to use it:

Perf script []

Perf script [] record []

Perf script [] report [script-args]

Perf script [] []

Perf script [] [script-args]

You can also write perl or python scripts for data analysis.

3.8 perf lock3.8.1 perf lock kernel configuration

To use this feature, the kernel needs support for compilation options: CONFIG_LOCKDEP, CONFIG_LOCK_STAT.

CONFIG_LOCKDEP defines acquired and release events.

CONFIG_LOCK_STAT defines contended and acquired lock events.

CONFIG_LOCKDEP=y

CONFIG_LOCK_STAT=y

3.8.2 perf lock usage

Analyze kernel lock statistics.

Locking is the method used by the kernel for synchronization. Once locked, other locked kernel execution paths must wait, reducing parallelism. At the same time, if the lock is not correct, it will cause a deadlock.

Therefore, the analysis of kernel locks is an important tuning work.

Perf lock record: grab the lock event information of the execution command to perf.data

Perf lock report: generate statistical reports

Perf lock script: displaying the original lock event

Perf lock info:

-k: sorting key, default is acquired, and can also be sorted by contended, wait_total, wait_max, and wait_min.

Name: name of the kernel lock.

Aquired: the number of times the lock was acquired directly, because no other kernel path occupies the lock, so you don't have to wait.

Contended: the number of times the lock is acquired after waiting, which is occupied by other kernel paths and needs to wait.

Total wait: the total wait time to acquire the lock.

Max wait: the maximum wait time to acquire the lock.

Min wait: the minimum wait time to acquire the lock.

3.9 perf kmem3.9.1 perf kmem introduction

Perf kmem is used to track and measure kernel slab allocator event information.

Such as memory allocation / release, etc. It can be used to study where a program allocates a large amount of memory, or where fragmentation occurs and other memory management-related issues.

Perf kmem and perf lock are actually subclasses of perf tracepoint, equivalent to perf record-e kmem:* and perf record-e lock:*.

But these tools in the internal team are the data for Huicong and analysis, so the statistical reports are more readable.

Perf kmem record: kernel slab allocator event for crawling commands

Perf kmem stat: generate kernel slab allocator statistics

Options:

-- caller displays statistics for each call point

-- alloc displays each memory allocation event

-s-- sort=

Sort the output (default: frag,hit,bytes for slab and bytes,hit for page). Available sort keys are ptr, callsite, bytes, hit, pingpong, frag for slab and page, callsite, bytes, hit, order, migtype, gfp for page.

This option should be preceded by one of the mode selection options-i.e.-- slab,-- page,-- alloc and/or-- caller.

-l, showing only a fixed number of rows

-- raw-ip

Print raw ip instead of symbol

-- slab analyzes slab allocator events

-- page Analysis Page allocation event

-- live

Show live page stat. The perf kmem shows total allocation stat by default, but this option shows live (currently allocated) pages instead. (This option works with-- page option only)

3.9.2 perf kmem using sudo perf kmem record ls

Sudo perf kmem stat displays only summary statistics:

SUMMARY (SLAB allocator)

=

Total bytes requested: 368589

Total bytes allocated: 369424

Total bytes wasted on internal fragmentation: 835

Internal fragmentation: 0.226028%

Cross CPU allocations: 0/2256

Sudo perf kmem-- alloc-- caller-- slab stat shows more detailed classification information:

-

Callsite | Total_alloc/Per | Total_req/Per | Hit | Ping-pong | Frag

-

Proc_reg_open+32 | 64 amp 64 | 40 amp 40 | 1 | 0 | 37.500%

Seq_open+34 | 384Candle 192 | 272Candle 136 | 2 | 0 | 29.167%

Apparmor_file_alloc_security+5c | 608Universe 32 | 456apper24 | 19 | 1 | 25.000%

Ext4_readdir+8bd | 64amp 64 | 48amp 48 | 1 | 0 | 25.000%

Ext4_htree_store_dirent+3e | 896Candle 68 | 770Accord 59 | 13 | 0 | 14.062%

Load_elf_phdrs+64 | 1024swap 512 | 896Unip 448 | 2 | 0 | 12.500%

Load_elf_binary+222 | 32 + 32 | 28 + 28 | 1 | 0 | 12.500%

Anon_vma_prepare+11b | 1280Candle 80 | 1152 Candle 72 | 16 | 0 | 10.000%

Inotify_handle_event+75 | 73664 Univer 64 | 66758 Candle 58 | 1151 | 0 | 9.375%

Do_execveat_common.isra.33+e5 | 2048Candle 256 | 1920Candle 240 | 8 | 1 | 6.250%

. |.

-

-

Alloc Ptr | Total_alloc/Per | Total_req/Per | Hit | Ping-pong | Frag

-

0xffff8800ca4d86c0 | 192 picks 192 | 136 picks 136 | 1 | 0 | 29.167%

0xffff8801ea05aa80 | 192 picks 192 | 136 picks 136 | 1 | 0 | 29.167%

0xffff8801f6ad6540 | 96Universe 96 | 68amp68 | 1 | 0 | 29.167%

0xffff8801f6ad6f00 | 96Universe 96 | 68amp68 | 1 | 0 | 29.167%

0xffff880214e65e80 | 96x32 | 72pacer 24 | 3 | 0 | 25.000%

0xffff8801f45ddac0 | 64amp 64 | 48amp 48 | 1 | 0 | 25.000%

0xffff8800ac4093c0 | 32 + 32 | 24 + 24 | 1 | 1 | 25.000%

0xffff8800af5a4260 | 32 + 32 | 24 + 24 | 1 | 0 | 25.000%

0xffff880214e651e0 | 32 + 32 | 24 + 24 | 1 | 0 | 25.000%

0xffff880214e65220 | 32 + 32 | 24 + 24 | 1 | 0 | 25.000%

0xffff880214e654e0 | 32 + 32 | 24 + 24 | 1 | 0 | 25.000%

-

SUMMARY (SLAB allocator)

=

Total bytes requested: 409260

Total bytes allocated: 417008

Total bytes wasted on internal fragmentation: 7748

Internal fragmentation: 1.857998%

Cross CPU allocations: 0/2833

The report has three parts: according to the section shown by Callsite, the so-called Callsite is the place in the kernel code where kmalloc and kfree are called.

For example, the proc_reg_open,Hit column of the function in the figure above is 1, indicating that the function has called kmalloc once during record.

For the first line Total_alloc/Per is displayed as 1024 Total_alloc/Per, the first value 1024 represents the total amount of memory allocated by the function proc_reg_open, and Per represents the average.

The two more interesting parameters are Ping-pong and Frag. Frag is easier to understand, that is, internal fragments. Although the internal fragmentation problem is to be solved relative to Buddy System,Slab, there are still internal fragments in slab. For example, a cache has a size of 1024, but the data structure size that needs to be allocated is 1022, so 2 bytes are fragmented. Frag is the proportion of fragments.

Ping-pong is a phenomenon. In a multi-CPU system, the memory shared by multiple CPU will appear "ping-pong phenomenon". A CPU allocates memory, and other CPU may access the memory object, or it may eventually be freed by another CPU. In a multi-CPU system, L1 cache is per CPU and CPU2 modifies memory, so the cache of other CPU must be updated, which is a loss of performance. Perf kmem judges the CPU number in the kfree event. If it is different from that in kmalloc, it is regarded as a ping-pong. Ideally, the smaller the ping-pong, the better. There is an article on oprofile on Ibm developerworks, in which tuning of cache can be used as a good reference.

Callsite: where kmalloc and kfree are called in kernel code.

Total_alloc/Per: the total amount of memory allocated, the average amount of memory allocated each time.

Total_req/Per: the total amount of memory requested, the average memory size per request.

Hit: the number of calls.

The number of times that Ping-pong:kmalloc and kfree are not executed by the same CPU, which results in inefficient cache.

Frag: percentage of fragments, fragmentation = allocated memory-requested memory, which is wasted.

If you use the-- alloc option, you will also see Alloc Ptr, the address of the allocated memory.

This is followed by a section based on the display of the location being called.

The last part is the summary data, showing the total allocated memory and fragmentation, Cross CPU allocation is the summary of ping-pong.

To analyze the-- page event, you need to add the-- page option to the record. Sudo perf kmem record-- page ls, use sudo perf kmem stat-- page to view the results:

0xee318 [0x8]: failed to process type: 68

Error during process events:-223.10 perf sched

Perf sched is dedicated to tracking / measuring schedulers, including latency, etc.

Perf sched record: recording scheduling events during testing

Perf sched latency: report thread scheduling latency and other scheduling-related properties

Perf sched script: view detailed trace information during execution

Perf sched replay: playback the execution process of perf sched record recording

Perf sched map: use characters to represent print context switch

After performing the sudo perf sched record ls, view the results in different ways.

Sudo perf sched latency, you can see the Average delay/Maximum delay time of the ls process. The meaning of each column is as follows: Task: process name and pid Runtime: actual run time Switches: number of process switches Average delay: average scheduling delay Maximum delay: maximum delay

The most noteworthy thing here is Maximum delay, from which you can see the feature that has the greatest impact on interactivity: scheduling delay. If the scheduling delay is relatively large, then users will feel intermittent video or audio.

-

Task | Runtime ms | Switches | Average delay ms | Maximum delay ms | Maximum delay at |

-

/ usr/bin/termin:2511 | 0.163 ms | 1 | avg: 0.019 ms | max: 0.019 ms | max at: 5398.723467 s

Ls:7806 | 1.175 ms | 1 | avg: 0.017 ms | max: 0.017 ms | max at: 5398.722333 s

Kworker/u12:3:7064 | 0.029 ms | 1 | avg: 0.011 ms | max: 0.011 ms | max at: 5398.723434 s

Migration/4:27 | 0.000 ms | 1 | avg: 0.007 ms | max: 0.007 ms | max at: 5398.722575 s

Perf:7801 | 1.256 ms | 1 | avg: 0.002 ms | max: 0.002 ms | max at: 5398.723509 s

-

TOTAL: | 2.623 ms | 5 |

Sudo perf sched script can see more detailed sched information, including sched_wakeup/sched_switch, and so on. The meaning of each column is: process name / pid/CPU ID/ timestamp.

Perf 7801 [002] 5398.722314: sched:sched_stat_sleep: comm=perf pid=7806 delay=110095391 [ns]

Perf 7801 [002] 5398.722316: sched:sched_wakeup: comm=perf pid=7806 prio=120 target_cpu=004

Swapper 0 [004] 5398.722328: sched:sched_stat_wait: comm=perf pid=7806 delay=0 [ns]

Swapper 0 [004] 5398.722333: sched:sched_switch: prev_comm=swapper/4 prev_pid=0 prev_prio=120 prev_state=R = = > next_comm=perf next_pid=7806 next_prio=120

Perf 7801 [002] 5398.722363: sched:sched_stat_runtime: comm=perf pid=7801 runtime=1255788 [ns] vruntime=3027478102 [ns]

Perf 7801 [5398.722364] 5398.722364: sched:sched_switch: prev_comm=perf prev_pid=7801 prev_prio=120 prev_state=S = > next_comm=swapper/2 next_pid=0 next_prio=120

Perf 7806 [004] 5398.722568: sched:sched_wakeup: comm=migration/4 pid=27 prio=0 target_cpu=004

Perf 7806 [004] 5398.722571: sched:sched_stat_runtime: comm=perf pid=7806 runtime=254732 [ns] vruntime=1979611107 [ns]

Perf 7806 5398.722575: sched:sched_switch: prev_comm=perf prev_pid=7806 prev_prio=120 prev_state=R+ = > next_comm=migration/4 next_pid=27 next_prio=0

Migration/4 27 [004] 5398.722582: sched:sched_stat_wait: comm=perf pid=7806 delay=13914 [ns]

Migration/4 27 [004] 5398.722586: sched:sched_migrate_task: comm=perf pid=7806 prio=120 orig_cpu=4 dest_cpu=2

Swapper 0 [002] 5398.722595: sched:sched_stat_wait: comm=perf pid=7806 delay=0 [ns]

Swapper 0 [002] 5398.722596: sched:sched_switch: prev_comm=swapper/2 prev_pid=0 prev_prio=120 prev_state=R = = > next_comm=perf next_pid=7806 next_prio=120

Migration/4 27 [004] 5398.722611: sched:sched_switch: prev_comm=migration/4 prev_pid=27 prev_prio=0 prev_state=S = > next_comm=swapper/4 next_pid=0 next_prio=120

Ls 7806 [002] 5398.723421: sched:sched_stat_sleep: comm=kworker/u12:3 pid=7064 delay=1226675 [ns]

Ls 7806 [002] 5398.723423: sched:sched_wakeup: comm=kworker/u12:3 pid=7064 prio=120 target_cpu=003

Swapper 0 [003] 5398.723432: sched:sched_stat_wait: comm=kworker/u12:3 pid=7064 delay=0 [ns]

Swapper 0 [003] 5398.723434: sched:sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R = = > next_comm=kworker/u12:3 next_pid=7064 next_prio=120

Kworker/u12:3 7064 [003] 5398.723441: sched:sched_stat_sleep: comm=/usr/bin/termin pid=2511 delay=80833386 [ns]

Kworker/u12:3 7064 [003] 5398.723447: sched:sched_wakeup: comm=/usr/bin/termin pid=2511 prio=120 target_cpu=004

Kworker/u12:3 7064 [003] 5398.723449: sched:sched_stat_runtime: comm=kworker/u12:3 pid=7064 runtime=29315 [ns] vruntime=846439549943 [ns]

Kworker/u12:3 7064 [003] 5398.723451: sched:sched_switch: prev_comm=kworker/u12:3 prev_pid=7064 prev_prio=120 prev_state=S = > next_comm=swapper/3 next_pid=0 next_prio=120

Swapper 0 [004] 5398.723462: sched:sched_stat_wait: comm=/usr/bin/termin pid=2511 delay=0 [ns]

Swapper 0 [004] 5398.723466: sched:sched_switch: prev_comm=swapper/4 prev_pid=0 prev_prio=120 prev_state=R = = > next_comm=/usr/bin/termin next_pid=2511 next_prio=120

Ls 7806 [002] 5398.723503: sched:sched_migrate_task: comm=perf pid=7801 prio=120 orig_cpu=2 dest_cpu=3

Ls 7806 [002] 5398.723505: sched:sched_stat_sleep: comm=perf pid=7801 delay=1142537 [ns]

Ls 7806 [002] 5398.723506: sched:sched_wakeup: comm=perf pid=7801 prio=120 target_cpu=003

Ls 7806 [002] 5398.723508: sched:sched_stat_runtime: comm=ls pid=7806 runtime=920005 [ns] vruntime=3028398107 [ns]

Swapper 0 [003] 5398.723508: sched:sched_stat_wait: comm=perf pid=7801 delay=0 [ns]

Swapper 0 [003] 5398.723508: sched:sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R = = > next_comm=perf next_pid=7801 next_prio=120

Ls 7806 [5398.723510] 5398.723510: sched:sched_switch: prev_comm=ls prev_pid=7806 prev_prio=120 prev_state=x = > next_comm=swapper/2 next_pid=0 next_prio=120

/ usr/bin/termin 2511 [004] 5398.723605: sched:sched_stat_runtime: comm=/usr/bin/termin pid=2511 runtime=162720 [ns] vruntime=191386139371 [ns]

/ usr/bin/termin 2511 [004] 5398.723611: sched:sched_switch: prev_comm=/usr/bin/termin prev_pid=2511 prev_prio=120 prev_state=S = > next_comm=swapper/4 next_pid=0 next_prio=120

The advantage of sudo perf sched map is that it provides a general view, summarizing hundreds of scheduling events and showing the distribution of system tasks between CPU. If there is a bad scheduling migration, such as a task that is not migrated to idle CPU in time but is migrated to other busy CPU, the problem of this kind of scheduler can be seen at a glance in the map report.

The asterisk indicates the CPU where the scheduling event occurs.

The dot indicates that the CPU is IDLE.

* A0 5398.722333 secs A0 = > perf:7806

*。 A0 5398.722365 secs. = > swapper:0

. * B0 5398.722575 secs B0 = > migration/4:27

* A0 B0 5398.722597 secs

A 0 *. 5398.722611 secs

A 0 * C 0. 5398.723434 secs C0 = > kworker/u12:3:7064

A 0 *. . 5398.723451 secs

A0. * D0 5398.723467 secs D0 = > / usr/bin/termin:2511

A0 * E0 D0 5398.723509 secs E0 = > perf:7801

*。 E0 D0 5398.723510 secs

. E 0 *. 5398.723612 secs

Perf sched replay, a tool designed specifically for scheduler developers, attempts to replay the scheduling scenarios recorded in the perf.data file. In many cases, if the average user discovers the strange behavior of the scheduler, they cannot accurately describe the scenario in which the situation occurred, or some test scenarios are not easy to reproduce, or just for the purpose of "laziness", using perf replay,perf will simulate the scenario in perf.data, so that developers do not have to spend a lot of time to reproduce the past, which is especially beneficial to the debugging process, because it needs to be repeated. Whether repeating the new changes over and over again can improve the problems found in the original scheduling scenario.

Run measurement overhead: 166 nsecs

Sleep measurement overhead: 52177 nsecs

The run test took 999975 nsecs

The sleep test took 1064623 nsecs

Nr_run_events: 11

Nr_sleep_events: 581

Nr_wakeup_events: 5

Task 0 (swapper: 0), nr_events: 11

Task 1 (swapper: 1), nr_events: 1

Task 2 (swapper: 2), nr_events: 1

Task 3 (kthreadd: 3), nr_events: 1

...

Task 563 (kthreadd: 7509), nr_events: 1

Task 564 (bash: 7751), nr_events: 1

Task 565 (man: 7762), nr_events: 1

Task 566 (kthreadd: 7789), nr_events: 1

Task 567 (bash: 7800), nr_events: 1

Task 568 (sudo: 7801), nr_events: 4

Task 569 (perf: 7806), nr_events: 8

# 1: 25.887, ravg: 25.89, cpu: 1919.68 / 1919.68

# 2: 27.994, ravg: 26.10, cpu: 2887.76 / 2016.49

# 3: 26.403, ravg: 26.13, cpu: 2976.09 / 2112.45

# 4: 29.400, ravg: 26.46, cpu: 1015.01 / 2002.70

# 5: 26.750, ravg: 26.49, cpu: 2942.80 / 2096.71

# 6: 27.647, ravg: 26.60, cpu: 3087.37 / 2195.78

# 7: 31.405, ravg: 27.08, cpu: 2762.43 / 2252.44

# 8: 23.770, ravg: 26.75, cpu: 2172.55 / 2244.45

# 9: 26.952, ravg: 26.77, cpu: 2794.93 / 2299.50

# 10: 30.904, ravg: 27.18, cpu: 973.26 / 2166.883.11 perf probe

Need to find vmlinux XXXXXXXXXXXXXXXXXX

Probe points can be customized.

Define new dynamic tracepoints.

Use examples

(1) Display which lines in schedule () can be probed

# perf probe-line schedule

Those with a line number in front can be detected, but not without a line number.

(2) Add a probe on schedule () function 12th line.

# perf probe-a schedule:12

Add a probe point at 12 of the schedule function.

3.14 perf trace

Perf trace is similar to strace, but adds analysis of other system events, such as pagefaults, task lifetime events, scheduling events, and so on.

The following command can view the scripts that have been installed on the system:

# perf trace-l List of available trace scripts: syscall-counts [comm] system-wide syscall counts syscall-counts-by-pid [comm] system-wide syscall counts, by pid failed-syscalls-by-pid [comm] system-wide failed syscalls, by pid

For example, the failed-syscalls script will be executed as follows:

# perf trace record failed-syscalls ^ C [perf record: Woken up 11 times to write data] [perf record: Captured and wrote 1.939 MB perf.data (~ 84709 samples)] perf trace report failed-syscalls perf trace started with Perl script\ / root/libexec/perf-core/scripts/perl/failed-syscalls.pl failed syscalls By comm: comm # errors-- firefox 1721 claws-mail 149 konsole 99 X 77 emacs 56 [...] Failed syscalls By syscall: syscall # errors-sys_read 2042 sys_futex 130sys_mmap_pgoff 71 sys_access 33 sys_stat64 5 sys_inotify_add_watch 4 [...]

The report shows the number of failures by process and by system call, respectively. It's very simple and straightforward, and if you use a normal perf record plus perf report command, you need to count these numbers manually or by script yourself.

4. Perf extended application 4.1 Flame Graph

FlameGraph is

1. Grab perf information and convert it

Perf record-F 99-a-g-- sleep 60perf script > out.perf./stackcollapse-perf.pl out.perf > out.folded./flamegraph.pl out.kern_folded > kernel.svg the above is the introduction and use of the analysis tool perf shared by the editor. If you happen to have similar doubts, please refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report