Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the function of the strace command

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "what is the function of strace command". In daily operation, I believe many people have doubts about the function of strace command. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubts about "what is the function of strace command?" Next, please follow the editor to study!

What is strace? System trace

According to the strace website, strace is a Linux user space tracker that can be used for diagnosis, debugging, and teaching. We use it to monitor the interaction between user-space processes and the kernel, such as system calls, signaling, process state changes, and so on.

The underlying layer of strace uses the ptrace feature of the kernel to implement its functions.

In the daily work of operation and maintenance, fault handling and problem diagnosis is a main content, but also a necessary skill. As a dynamic tracking tool, strace can help operators locate process and service faults efficiently. It's like a detective who tells you the truth about anomalies through the clues of system calls.

What can strace do?

Operation and maintenance engineers are all practical people, so let's give an example first.

We copy a software package called some_server from another machine, and the developer says that it can be started directly and nothing needs to be changed. But the error was reported when I tried to start, and I couldn't get up at all!

Start command:. / some_server.. / conf/some_server.conf output: FATAL: InitLogFile failed iRet:-1 starting Init error:-1655

Why can't you get up? From the log, it seems that the initialization of the log file failed, what is the truth? Let's take a look at it with strace.

Strace-tt-f. / some_server.. / conf/some_server.conf

Output:

We notice that on the line before the output InitLogFile failed error, there is an open system call:

23 O_RDWR 14VR 24.448034 open ("/ usr/local/apps/some_server/log/server_agent.log", O_RDWR | O_CREAT | O_APPEND | O_LARGEFILE, 0666) =-1 ENOENT (No such file or directory)

It tries to open the file / usr/local/apps/some_server/log/server_agent.log to write (create it if it doesn't exist), but it makes an error with the return code of-1 and the system error number errorno of ENOENT. Check the man page of the open system call:

Man 2 open

Search for the explanation of the error number errno of ENOENT

ENOENT O_CREAT is not set and the named file does not exist. Or, a directory component in pathname does not exist or is a dangling symbolic link.

This is clear because the open option in our example specifies the O_CREAT option, and here errno is ENOENT because some part of the log path does not exist or is an invalid symbolic link. Let's take a look at which part of the path does not exist:

Ls-1 / usr/local/apps/some_server/logls: cannot access / usr/local/apps/some_server/log: No such file or directoryls-1 / usr/local/apps/some_servertotal 8drwxr-xr-x 2 root users 4096 May 14 23:13 bindrwxr-xr-x 2 root users 4096 May 14 22:48 conf

The log subdirectory does not exist! Upper-level directories all exist. After manually creating the log subdirectory, the service can start normally.

In retrospect, what on earth can strace do?

It can open the black box of the application process and tell you what the process is probably doing through the clues of system calls.

How do I use strace?

Since strace is used to track system calls and signals of user-space processes, before moving on to the topic used by strace, we need to understand what system calls are.

About system calls:

According to Wikipedia, in computers, system calls (English: system call), also known as system calls, refer to programs running in user space requesting services that require higher privileges from the operating system kernel. System calls provide an interface between the user program and the operating system.

The process space of the operating system is divided into user space and kernel space:

The operating system kernel runs directly on the hardware, providing device management, memory management, task scheduling and other functions.

User space performs its functions through API requests for kernel space services-- these API that the kernel provides to user space are system calls.

On the Linux system, the application code uses the system call indirectly through the function encapsulated by the glibc library.

The Linux kernel currently has more than 300 system calls, and a detailed list can be seen on the syscalls man page. These system calls fall into several main categories:

File and device access classes such as open/close/read/write/chmod, process management, fork/clone/execve/exit/getpid, signal, signal/sigaction/kill, memory management, brk/mmap/mlock, inter-process communication, IPC shmget/semget * semaphores, shared memory, message queues, network communications, socket/connect/sendto/sendmsg, etc.

Familiarity with Linux system call / system programming makes it easy for us to use strace. However, for the problem positioning of operation and maintenance, it is almost enough to know that strace is a tool and will check the system call manual.

Students who want to know more about it are advised to read books such as Linux system programming, Unix Environment Advanced programming and so on.

Let's get back to the use of strace. Strace has two modes of operation.

One is to use it to start the process to be tracked. The usage is simple, just add strace to the original command. For example, if we want to track the execution of the command "ls-lh / var/log/messages", we can do this:

Strace ls-lh / var/log/messages

Another mode of operation is to track a process that is already running and understand what it is doing without interrupting its execution. In this case, pass the-p pid option to strace.

For example, if you have a running some_server service, the first step is to look at pid:

Pidof some_server 17553

Get its pid 17553 and then track its execution with strace:

Strace-p 17553

When the trace is complete, press ctrl + C to end the strace.

Strace has some options to adjust its behavior. Here we introduce a few of the more commonly used ones, and then use examples to illustrate their practical application.

Common options for strace:

From a sample command:

Strace-tt-T-v-f-e trace=file-o / data/log/strace.log-s 1024-p 23489

-tt displays millisecond time in front of each line of output

-T shows the time spent on each system call

-v for some related calls, type out the complete environment variables, file stat structure, etc.

-f track the target process and all child processes created by the target process

-e controls the events and trace behavior to be tracked, such as specifying the name of the system call to be traced

-o write the output of strace to the specified file separately

-s when a parameter of the system call is a string, the maximum output length is 32 bytes.

-p specifies the process pid to be tracked. To trace multiple pid at the same time, repeat the-p option multiple times.

Example: track nginx to see which files are accessed when it starts

Strace-tt-T-f-e trace=file-o / data/log/strace.log-s 1024. / nginx

Partial output:

In the output, the first column shows the pid of the process, followed by millisecond time, which is the effect of the-tt option.

The last column of each row, which shows how long the call took, is the result of the-T option.

The output here shows only what is relevant to file access, because we specified it with the-e trace=file option.

Case of strace problem location 1. Abnormal exit of location process

Problem: there is a resident script called run.sh on the machine, which will die after running for a minute. We need to find out the cause of death.

Location: while the process is still running, get its pid through the ps command, assuming that the pid we get is 24298

Strace-o strace.log-tt-p 24298

Looking at strace.log, we see the following in the last two lines:

22 wait4 42.803937 wait4 (- 1,22) 43.228422 + killed by SIGKILL + +

As you can see here, the process is killed by other processes with KILL signals.

In fact, through analysis, we found that other services on the machine have a monitoring script that monitors a process also called run.sh, and when it is found that the number of run.sh processes is greater than 2, it will kill it and restart it. As a result, our run.sh script was mistakenly killed.

When a process is killed and exits, strace will output killed by SIGX (SIGX represents the signal sent to the process), etc., so what will be output when the process exits itself?

Here is a program called test_exit, whose code is as follows:

# include # include int main (int argc, char * * argv) {exit (1);}

Let's strace to see what marks can be seen on the strace when it exits.

Strace-tt-e trace=process-f. / test_exit

Note:-e trace=process means that only system calls related to process management are tracked.

Output:

23test_exit 07test_exit 24.672849 execve (". / test_exit", [. / test_exit "], [/ * 35 vars * /]) = 0237test_exit 24.674665 arch_prctl (ARCH_SET_FS, 0x7f1c0eca7740) = 0237test_exit 24.675108 exit_group (1) =? 237lav 24.675259 + exited with 1 + +

As you can see, when the process exits itself (calling the exit function, or returning from the main function), the final call is the exit_group system call, and strace outputs exited with X (X is the exit code).

Some people may wonder, exit is clearly called in the code, how can it be displayed as exit_group?

This is because the exit function here is not a system call, but a function provided by the glibc library, and the call to the exit function will eventually be converted into an exit_group system call, which exits all threads of the current process. In fact, there is a system call called _ exit () (notice the underscore before exit), which is eventually called when the thread exits.

2. Locate the shared memory exception

There is a service startup Times error:

Shmget 267264 30097568: Invalid argumentCan not get shm...exit!

The error log probably tells us that it was an error in getting shared memory. Take a look at it through strace:

Strace-tt-f-e trace=ipc. / a_mon_svr. / conf/a_mon_svr.conf

Output:

22 Process 46 attached22:46:36.355439 shmget 36.351798 shmget (0x5feb, 12000, 0666) = 022 shmat 46 attached22:46:36.355439 shmget 36.351939 attached22:46:36.355439 shmget (0,0,0) =-1 EINVAL (Invalid argument) shmget 267264 30097568: Invalid argumentCan not get shm...exit!

Here, we have strace track only system calls related to process communication with the-e trace=ipc option.

From the strace output, we know that the shmget system call went wrong and that errno is EINVAL. Similarly, query the shmget man page and search for the description of the error code for EINVAL:

EINVAL A new segment was to be created and size

< SHMMIN or size >

SHMMAX, or no new segment was to be created, a segment with given key existed, but size is greater than the size of that segment

The reason why shmget sets the EINVAL error code is one of the following:

The shared memory segment to be created is smaller than SHMMIN (usually 1 byte)

The shared memory segment to be created is larger than SHMMAX (kernel parameter kernel.shmmax configuration)

The shared memory segment for the specified key already exists and its size is different from the value passed when the shmget is called.

From the strace output, the key 0x41400 of the shared memory we want to connect to is 30097568 bytes, which obviously does not match the first and second cases. That leaves only the third case. Use ipcs to see if there is really a size mismatch:

Ipcs-m | grep 41400key shmid owner perms bytes nattch status 0x00041400 1015822 root 30095516 1

As you can see, the 0x41400 key already exists, and its size is 30095516 bytes, which does not match 30097568 of our call parameters, so this error occurs.

In our case, the reason for the inconsistent size of shared memory is that one of the programs is compiled to 32-bit and the other to 64-bit, using the variable-length int data type long in the code.

Compiling both programs to 64 solves this problem.

Strace's-e trace option is specifically mentioned here.

To track a specific system call,-e trace=xxx is fine. But sometimes we have to track a class of system calls, such as all calls related to file names, all calls related to memory allocation.

If you enter each specific system call name manually, it may be easy to miss. So strace provides several common system call combination names.

-e trace=file tracks calls related to file access (file name is included in the parameter)

-e trace=process calls related to process management, such as fork/exec/exit_group

-e trace=network calls related to network traffic, such as socket/sendto/connect

-e trace=signal signal transmission and processing related, such as kill/sigaction

-e trace=desc is related to file descriptors, such as write/read/select/epoll, etc.

-e trace=ipc process is related to classmates, such as shmget, etc.

In most cases, it is enough for us to use the combination name above. When you really need to track specific system calls, you may need to pay attention to the differences in the implementation of the C library.

For example, we know that the fork system call is used to create the process, but in glibc, the call to fork is actually mapped to the lower-level clone system call. When using strace, you have to specify-e trace=clone and specify that-e trace=fork doesn't match anything.

3. Performance analysis

If there is a need, count the number of lines of code (including assembly and C code) in the Linux 4.5.4 kernel. Here are two Shell script implementations:

Poor_script.sh:

! / bin/bash

Total_line=0

While read filename; do

Line=$ (wc-l $filename | awk'{print $1}')

((total_line + = line))

Done <

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report