What is Zero copy Technology in Linux 07/11 Update SLTechnology News&Howtos

What is Zero copy Technology in Linux

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article is to share with you about what zero-copy technology is in Linux. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

Citation

File download is a basic function when writing a server program (Web Server or file server). At this time, the task of the server is to send the files in the server host disk from the connected socket without modification, which is usually done with the following code:

While ((n = read (diskfd, buf, BUF_SIZE)) > 0) write (sockfd, buf, n)

The basic operation is to circularly read the contents of the file from disk into the buffer, and then send the contents of the buffer to socket. However, because the Linux's iUnip O operation is buffered by default, iUnip O is used. The two system calls, read and write, are mainly used, and we don't know what the operating system does in it. In fact, multiple copies of the data have occurred in the above Imap O operation.

When an application accesses a piece of data, the operating system first checks whether the file has been accessed recently, and whether the contents of the file are cached in the kernel buffer. If so, the operating system copies the contents of the kernel buffer to the user space buffer specified by buf directly according to the buf address provided by the read system call. If not, the operating system first copies the data on the disk to the kernel buffer, which currently relies mainly on DMA for transmission, and then copies the contents of the kernel buffer to the user buffer.

Next, the write system call copies the contents of the user buffer to the kernel buffer associated with the network stack, and finally socket sends the contents of the kernel buffer to the network card. Having said so much, it is better to look at the picture clearly:

Data copy

As can be seen from the figure above, four copies of data have been produced. Even if DMA is used to handle communication with hardware, CPU still needs to process two copies of data. At the same time, context switching has occurred many times in user mode and kernel state, which undoubtedly increases the burden of CPU.

In this process, we did not make any changes to the contents of the file, so copying data back and forth in kernel space and user space is undoubtedly a waste, and zero copy is mainly to solve this inefficiency.

What is zero copy technology (zero-copy)?

The main task of zero copy is to prevent CPU from copying data from one storage to another. The main task is to use a variety of zero copy technologies to avoid letting CPU do a lot of data copy tasks, reduce unnecessary copies, or let other components do this kind of simple data transfer tasks, so that CPU is free to focus on other tasks. In this way, the use of system resources can be made more efficient.

Let's go back to the example in the citation, how can we reduce the number of data copies? An obvious focus is to reduce the amount of data being copied back and forth between kernel space and user space, which introduces a type of zero copy:

So that data transmission does not need to go through the user space.

Use mmap

One way we can reduce the number of copies is to call mmap () instead of the read call:

Buf = mmap (diskfd, len); write (sockfd, buf, len)

The application calls mmap (), and the data on the disk passes through the kernel buffer that DMA is copied, and then the operating system shares this kernel buffer with the application, so that there is no need to copy the contents of the kernel buffer to user space. The application calls write (), and the operating system copies the contents of the kernel buffer directly into the socket buffer, all of which happens in kernel state. Finally, the socket buffer sends the data to the network card. Again, looking at the picture is simple:

Mmap

Using mmap instead of read significantly reduces one copy, which undoubtedly improves efficiency when the amount of copy data is large. But there is a price to use mmap. When you use mmap, you may encounter some hidden traps. For example, when your program map a file, but when the file is truncated by another process (truncate), the write system call will be terminated by the SIGBUS signal for accessing the illegal address. By default, the SIGBUS signal will kill your process and generate a coredump, which will result in a loss if your server is suspended in this way.

Usually we use the following solutions to avoid this problem:

1. Establish a signal processing program for SIGBUS signals

When a SIGBUS signal is encountered, the signal processor simply returns, the write system call returns the number of bytes that have been written before it is interrupted, and errno is set to success, but this is a bad way to handle it, because you don't have the core of the problem.

two。 Use file rental locks

Usually we use this method, using a lease lock on the file descriptor, and we apply for a lease lock from the kernel for the file, and when other processes want to truncate the file, the kernel sends us a real-time RTSIGNALLEASE signal telling us that the kernel is destroying the read-write lock you have attached to the file. In this way, your write system call will be interrupted before the program accesses illegal memory and is killed by SIGBUS. Write returns the number of bytes that have been written and sets errno to success.

We should lock the mmap file before and unlock it after manipulating the file:

If (fcntl (diskfd, F_SETSIG, RT_SIGNAL_LEASE) =-1) {perror ("kernel lease set signal"); return-1;} / * l_type can be F_RDLCK F_WRLCK lock * / * l_type can be F_UNLCK unlock * / if (fcntl (diskfd, F_SETLEASE, l_type)) {perror ("kernel lease set type"); return-1;}

Use sendfile

Starting with version 2.1 of the kernel, Linux introduced sendfile to simplify operations:

# include ssize_t sendfile (int out_fd, int in_fd, off_t * offset, size_t count)

The system call sendfile () passes the file contents (bytes) between the descriptor infd representing the input file and the descriptor outfd representing the output file. The descriptor outfd must point to a socket, and the file that infd points to must be mmap. These limitations limit the use of sendfile so that sendfile can only pass data from files to sockets and vice versa.

The use of sendfile not only reduces the number of data copies, but also reduces context switching, and data transfer always occurs only in kernel space.

Sendfile system call procedure

What happens if some other process truncates the file when we call sendfile? Assuming that we do not set any signal handlers, the sendfile call simply returns the number of bytes it has transferred before it is interrupted, and errno will be set to success. If we lock the file before calling sendfile, the behavior of sendfile is still the same as before, and we will also receive a signal from RTSIGNALLEASE.

So far, we have reduced the number of data copies, but there is still one copy, which is the copy of the page cached to the socket cache. Then can you omit this copy, too?

With the help of hardware, we can do it. Previously, we copied the data from the page cache into the socket cache, but in fact, we just need to pass the buffer descriptor to the socket buffer and then pass the data length, so that the DMA controller can package and send the data in the page cache directly to the network.

To sum up, the sendfile system call uses the DMA engine to copy the contents of the file to the kernel buffer, and then adds the buffer descriptor with file location and length information to the socket buffer. This step does not copy the data from the kernel into the socket buffer. The DMA engine copies the data from the kernel buffer to the protocol engine to avoid the last copy.

Sendfile with DMA

However, this function of collecting copies requires hardware and driver support.

Use splice

Sendfile is only suitable for copying data from a file to a socket, limiting its use. Linux introduced splice system calls in version 2.6.17 to move data in two file descriptors:

# define _ GNU_SOURCE / * See feature_test_macros (7) * / # include ssize_t splice (int fd_in, loff_t * off_in, int fd_out, loff_t * off_out, size_t len, unsignedint flags)

The splice call moves data between two file descriptors without the need for data to be copied back and forth between kernel space and user space. He copies len-length data from fdin to fdout, but one side must be plumbing equipment, which is some of the limitations of splice at present. The flags parameter has the following values:

SPLICEFMOVE: try to move data instead of copying it. This is just a hint for the kernel: if the kernel cannot move data from pipe or if pipe's cache is not a full page, you still need to copy the data. There are some problems with the initial implementation of Linux, so this option does not work from 2.6.21, and a later version of Linux should implement it.

SPLICEFNONBLOCK: the splice operation is not blocked. However, if the file descriptor is not set to the unblockable mode of I _ splice O, then the call to the file may still be blocked.

SPLICEFMORE: there will be more data later in the splice call.

The splice call takes advantage of the pipe buffer mechanism proposed by Linux, so at least one descriptor should be a pipe.

The above zero-copy technologies are all implemented by reducing data copying in user space and kernel space, but sometimes data must be copied between user space and kernel space. At this point, we can only work on the timing of copying data in user space and kernel space. Linux usually uses copy-on-write (copy on write) to reduce system overhead, which is often called COW.

Because of the space, this article does not introduce replication at write time in detail. It is roughly described as follows: if multiple programs access the same piece of data at the same time, then each program has a pointer to this piece of data, and in the view of each program, it owns this piece of data independently. Only when the program needs to modify the data content will it copy the data content into the program's own application space, and the data will become the private data of the program. If the program does not need to modify the data, it will never need to copy the data into its own application space. This reduces the copy of data. You can write another article on what you copy when you write.

In addition, there are also some zero-copy technologies, such as the traditional Linux I O_DIRECT O can be directly added to the fbufs mark to avoid automatic caching, as well as the immature fbufs technology. This article has not covered all the zero-copy technologies, but only introduces some common ones. If you are interested, you can study them by yourself. Generally, mature server projects will also modify the parts of the kernel related to IUnip O by themselves. Increase your own data transfer rate.

Thank you for reading! This is the end of this article on "what is Zero copy Technology in Linux". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.