How to use strace to understand system calls on Linux 04/16 Update SLTechnology News&Howtos

How to use strace to understand system calls on Linux

2025-04-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article introduces the knowledge of "how to use strace to understand system calls on Linux". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Use strace to track the interaction between the user process and the Linux kernel.

System call system call is a programming way for programs to request services from the kernel, and strace is a powerful tool that allows you to track the interaction between user processes and the Linux kernel.

To understand how the operating system works, you first need to understand how system calls work. One of the main functions of the operating system is to provide abstract mechanisms for user programs.

The operating system can be roughly divided into two modes:

Kernel mode: a powerful privileged mode used by the operating system kernel

User mode: most users who run user applications mostly use command-line utilities and graphical user interfaces (GUI) to perform daily tasks. The system call runs silently in the background, interacting with the kernel to get the job done.

System calls are very similar to function calls, which means that they both accept and process parameters and return values. The only difference is that the system call enters the kernel, while the function call does not. Switching from user space to kernel space is done using a special trap mechanism.

By using system libraries (also known as glibc on Linux systems), most system calls are hidden from users. Although system calls are generic in nature, the mechanism by which system calls are made depends largely on the machine (architecture).

This article explores some practical examples by using some general commands and using strace to analyze the system calls made by each command. These examples use Red Hat Enterprise Linux, but these commands should run the same on other Linux distributions:

[root@sandbox] # cat / etc/redhat-releaseRed Hat Enterprise Linux Server release 7.7 (Maipo) [root@sandbox] # [root@sandbox] # uname-r3.10.0-1062.el7.x86_64 [root@sandbox ~] #

First, make sure that the necessary tools are installed on the system. You can use the following rpm command to verify that strace is installed. If installed, you can use the-V option to check the version number of the strace utility:

[root@sandbox ~] # rpm-qa | grep-I stracestrace-4.12-9.el7.x86_64 [root@sandbox ~] # [root@sandbox ~] # strace- Vstrace-version 4.12 [root@sandbox ~] #

If it is not installed, run the command to install:

Yum install strace

For the purposes of this example, create a test directory in / tmp and use the touch command to create two files:

[root@sandbox ~] # cd / tmp/ [root@sandbox tmp] # [root@sandbox tmp] # mkdir testdir [root@sandbox tmp] # [root@sandbox tmp] # touch testdir/file1 [root@sandbox tmp] # touch testdir/file2 [root@sandbox tmp] #

I use the / tmp directory because everyone can access it, but you can choose another directory as needed. )

Use the ls command in the testdir directory to verify that the file has been created:

[root@sandbox tmp] # ls testdir/file1 file2 [root@sandbox tmp] #

You may be using the ls command every day without realizing the role the system call plays under it. Abstractly, the command works as follows:

Command line tool-> call function from system library (glibc)-> call system call

The ls command internally calls functions from the system library on Linux (that is, glibc). These libraries call system calls that do most of the work.

If you want to know which functions are called from the glibc library, use the ltrace command, and then follow the regular ls testdir/ command:

Ltrace ls testdir/

If ltrace is not installed, type the following command to install:

Yum install ltrace

A large amount of output will be piled on the screen; don't worry, just keep going. Some of the important library functions related to this example in the ltrace command output include:

Opendir ("testdir/") = {3} readdir ({3}) = {101879119, "."} readdir ({3}) = {134,".. "} readdir ({3}) = {101879120 "file1"} strlen ("file1") = 5memcpy (0x1665be0, "file1\ 0", 6) = 0x1665be0readdir ({3}) = {101879122, "file2"} strlen ("file2") = 5memcpy (0x166dcb0, "file2\ 0" 6) = 0x166dcb0readdir ({3}) = nilclosedir ({3})

By looking at the output above, you may be able to understand what is happening. The opendir library function opens a directory called testdir and then calls the readdir function, which reads the contents of the directory. Finally, there is a call to the closedir function, which closes the previously opened directory. Please ignore the other strlen and memcpy features for now.

You can see which library functions are being called, but this article will focus on system calls called by system library functions.

Similar to the above, to know which system calls were called, simply put strace before the ls testdir command, as shown below. Again, a pile of garbled codes is thrown on your screen, and you can follow these steps:

[root@sandbox tmp] # strace ls testdir/execve ("/ usr/bin/ls", ["ls", "testdir/"], [/ * 40 vars * /]) = 0brk (NULL) = 0x1f12000 > write (1, "file1 file2\ n", 13file1 file2) = 13close (1) = 0munmap (0x7fd002c8d000) 4096) = 0close (2) = 0exit_group (0) =? + exited with 0 + + [root@sandbox tmp] #

The output on the screen after running the strace command is the system call that runs the ls command. Each system call provides a specific purpose for the operating system, which can be roughly divided into the following parts:

Process management system call

File management system call

Directory and file system management system call

Other system calls

An easier way to analyze the information displayed on the screen is to record the output to a file using the convenient-o flag of strace. Add an appropriate file name after the-o flag, and then run the command again:

[root@sandbox tmp] # strace-o trace.log ls testdir/file1 file2 [root@sandbox tmp] #

This time, without any output interfering with the screen display, the ls command works as expected, showing the file name and recording all output to the file trace.log. Just a simple ls command, the file has nearly 100 lines:

[root@sandbox tmp] # ls-l trace.log-rw-r--r--. 1 root root 7809 Oct 12 13:52 trace.log [root@sandbox tmp] # [root@sandbox tmp] # wc-l trace.log114 trace.log [root@sandbox tmp] #

Let's take a look at the first line of the trace.log file for this example:

Execve ("/ usr/bin/ls", ["ls", "testdir/"], [/ * 40 vars * /]) = 0

The first word of the line, execve, is the name of the system call that is being executed.

The text in parentheses is the parameter provided to the system call.

The number after the sign = (0 in this case) is the return value of the execve system call.

The current output doesn't seem too scary, does it? You can apply the same logic to understand other lines.

Now, focus on the single command you invoke, that is, ls testdir. You know the directory name used by the command ls, so why not use grep in the trace.log file to find testdir and see the results? Let's take a closer look at each line of the result:

[root@sandbox tmp] # grep testdir trace.logexecve ("/ usr/bin/ls", ["ls", "testdir/"], [/ * 40 vars * /]) = 0stat ("testdir/", {st_mode=S_IFDIR | 0755, st_size=32,...}) = 0openat (AT_FDCWD, "testdir/", O_RDONLY | O_NONBLOCK | O_DIRECTORY | O_CLOEXEC) = 3 [root@sandbox tmp] #

Looking back at the previous analysis of execve, can you talk about the role of this system call?

Execve ("/ usr/bin/ls", ["ls", "testdir/"], [/ * 40 vars * /]) = 0

You don't need to remember all the system calls or what they do, because you can refer to the documentation when you need it. Man pages can save you! Before running the man command, ensure that the following software packages are installed:

[root@sandbox tmp] # rpm-qa | grep-I man-pagesman-pages-3.53-5.el7.noarch [root@sandbox tmp] #

Remember, you need to add 2 between the man command and the system call name. If you use man man to read the man pages of the man command, you will see that Section 2 is reserved for system calls. Similarly, if you need information about the library function, you need to add a 3 between the man and the library function name.

The following is the chapter number of the manual and the types of pages it contains:

1: executable program or shell command

2: system calls (functions provided by the kernel)

3: library calls (functions in the library of the program)

4: special files (usually appear in / dev)

Run the following man command with the system call name to view the documentation for the system call:

Man 2 execve

Follow the execve man page, which executes the program passed in the parameters (in this case, ls). You can provide other parameters for ls, such as testdir in this example. Therefore, this system call runs ls only with testdir as a parameter:

Execve-execute program DESCRIPTION execve () executes the program pointed to by filename

The next system call, called stat, takes the testdir parameter:

Stat ("testdir/", {st_mode=S_IFDIR | 0755, st_size=32,...}) = 0

Use man 2 stat to access the document. Stat is a system call to get the status of a file. Remember, everything in Linux is a file, including a directory.

Next, the openat system call opens testdir. Pay close attention to the returned 3. This is a file descriptor that will be used in future system calls:

Openat (AT_FDCWD, "testdir/", O_RDONLY | O_NONBLOCK | O_DIRECTORY | O_CLOEXEC) = 3

So far so good. Now, open the trace.log file and go to the line after the openat system call. You will see the getdents system call called, which completes most of the operations required to execute the ls testdir command. Now, get the getdents from the trace.log file with grep:

[root@sandbox tmp] # grep getdents trace.loggetdents (3, / * 4 entries * /, 32768) = 112getdents (3, / * 0 entries * /, 32768) = 0 [root@sandbox tmp] #

Getdents's man page describes it as "getting a catalog item", which is what you want to do. Notice that the parameter to getdents is 3, which is the file descriptor from the above openat system call.

Now that you have a directory list, you need a way to display it in the terminal. Therefore, search the log for another system call write that is used to write to the terminal with grep:

[root@sandbox tmp] # grep write trace.logwrite (1, "file1 file2\ n", 13) = 13 [root@sandbox tmp] #

Among these parameters, you can see the file names that will be displayed: file1 and file2. With regard to the first parameter (1), remember that in Linux, when you run any process, three file descriptors are opened by default for it. The following are the default file descriptors:

0: standard input

1: standard output

2: standard error

Therefore, the write system call will display file1 and file2 on the standard display (this terminal, identified by 1).

Now you know which system call does most of the work of the ls testdir/ command. But what about the other 100 + system calls in the trace.log file? The operating system has to do a lot of housekeeping to run a process, so much of what you see in this log file is process initialization and cleanup. Read the entire trace.log file and try to understand how the ls command works.

Now that you know how to analyze system calls for a given command, you can apply this knowledge to other commands to understand which system calls are being executed. Strace provides a number of useful command-line flags to make it easier for you to use, some of which are described below.

By default, strace does not contain all system call information. However, it has a convenient-v redundancy option that provides additional information in each system call:

Strace-v ls testdir

It is a good practice to always use the-f option when running the strace command. It allows strace to track any child processes created by the process currently being tracked:

Strace-f ls testdir

Suppose you only need the name of the system call, the number of times it runs, and the percentage of time spent on each system call. You can use the-c flag to get these statistics:

Strace-c ls testdir/

Suppose you want to focus on specific system calls, such as open system calls, and ignore the rest. You can use the-e flag to follow the name of the system call:

[root@sandbox tmp] # strace-e open ls testdiropen ("/ etc/ld.so.cache", O_RDONLY | O_CLOEXEC) = 3open ("/ lib64/libselinux.so.1", O_RDONLY | O_CLOEXEC) = 3open ("/ lib64/libcap.so.2", O_RDONLY | O_CLOEXEC) = 3open ("/ lib64/libacl.so.1", O_RDONLY | O_CLOEXEC) = 3open ("/ lib64/libc.so.6", O_RDONLY | O_CLOEXEC) = 3open ("/ lib64/libpcre.so.1") O_RDONLY | O_CLOEXEC) = 3open ("/ lib64/libdl.so.2", O_RDONLY | O_CLOEXEC) = 3open ("/ lib64/libattr.so.1", O_RDONLY | O_CLOEXEC) = 3open ("/ lib64/libpthread.so.0", O_RDONLY | O_CLOEXEC) = 3open ("/ usr/lib/locale/locale-archive", O_RDONLY | O_CLOEXEC) = 3file1 file2+++ exited with 0 + + [root@sandbox tmp] #

What if you want to focus on multiple system calls? Don't worry, you can also use the-e command line flag and separate the names of the two system calls with commas. For example, to view write and getdents system calls:

[root@sandbox tmp] # strace-e write,getdents ls testdirgetdents (3, / * 4 entries * /, 32768) = 112getdents (3, / * 0 entries * /, 32768) = 0write (1, "file1 file2\ n", 13file1 file2) = 13 years + exited with 0 + + [root@sandbox tmp] #

So far, these examples are explicitly run by commands that are tracked. But what about tracking commands that have been run and are being executed? For example, what if you want to track daemons that run processes for long periods of time? To do this, strace provides a special-p flag to which you can provide process ID.

Instead of running strace on the daemon, our example takes the cat command as an example. If you take the file name as an argument, cat usually displays the contents of the file. If no arguments are given, the cat command waits for the user to enter text on the terminal. After entering the text, it repeats the given text until the user presses Ctrl + C to exit.

Run the cat command from a terminal; it will show you a prompt and wait there (remember that cat is still running and has not exited):

[root@sandbox tmp] # cat

On the other terminal, use the ps command to find the process identifier (PID):

[root@sandbox] # ps-ef | grep catroot 22443 20164 0 14:19 pts/0 00:00:00 catroot 22482 20300 0 14:20 pts/1 00:00:00 grep-- color=auto cat [root@sandbox ~] #

Now, use the-p flag and PID (found using ps above) to run strace on the running process. After running strace, its output shows the contents of the connected process and its PID. Strace is now tracking the system calls made by the cat command. The first system call you see is read, which is waiting for input from the file descriptor 0 (standard input, which is the terminal running the cat command):

[root@sandbox ~] # strace-p 22443strace: Process 22443 attachedread (0

Now, go back to the terminal where you ran the cat command and enter some text. I entered x0x0 for demonstration purposes. Notice how cat simply repeats what I typed. As a result, x0x0 appeared twice. I entered the first and the second is the repeated output of the cat command:

[root@sandbox tmp] # catx0x0x0x0

Return to the terminal that connects strace to the cat process. Now you'll see two additional system calls: the earlier read system call, which now reads x0x0 in the terminal, and the other is write, which writes x0x0 back to the terminal, and then a new read, waiting to be read from the terminal. Note that both standard input (0) and standard output (1) are in the same terminal:

[root@sandbox] # strace-p 22443strace: Process 22443 attachedread (0, "x0x0\ n", 65536) = 5write (1, "x0x0\ n", 5) = 5read (0

Imagine how helpful it would be to run strace on the daemon to see all its operations in the background. Press Ctrl + C to kill cat; since the process is no longer running, this will also terminate your strace session.

To see the timestamps of all system calls, simply use the-t option with strace:

[root@sandbox ~] # strace-t ls testdir/ 14:24:47 execve ("/ usr/bin/ls", ["ls", "testdir/"], [/ * 40 vars * /]) = 014 24 vars 47 brk (NULL) = 0x1f0700014:24:47 mmap (NULL, 4096, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS,-1, 0) = 0x7f2530bc800014:24:47 access ("/ etc/ld.so.preload") R_OK) =-1 ENOENT (No such file or directory) 14:24:47 open ("/ etc/ld.so.cache", O_RDONLY | O_CLOEXEC) = 3

What if you want to know the time spent between two system calls? Strace has a convenient-r command that shows how long it takes to execute each system call. Very useful, isn't it?

[root@sandbox ~] # strace-r ls testdir/ 0.000000 execve ("/ usr/bin/ls", ["ls", "testdir/"], [/ * 40 vars * /]) = 00.000368 brk (NULL) = 0x19660000.000073 mmap (NULL, 4096, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS,-1,0) = 0x7fb6b11550000.000047 access ("/ etc/ld.so.preload") R_OK) =-1 ENOENT (No such file or directory) 0.000119 open ("/ etc/ld.so.cache", O_RDONLY | O_CLOEXEC) = 3 "how to use strace to understand system calls on Linux" ends here Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.