How to realize zero copy of Linux 07/13 Update SLTechnology News&Howtos

How to realize zero copy of Linux

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article introduces the relevant knowledge of "how to achieve zero copy of Linux". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

In order to quickly establish the concept of zero copy, we introduce a commonly used scenario. File download is a basic function when writing a server program (Web Server or file server).

At this time, the task of the server is to send the files in the server host disk from the connected Socket without modification.

We usually do this with the following code:

While ((n = read (diskfd, buf, BUF_SIZE)) > 0) write (sockfd, buf, n)

The basic operation is to circularly read the contents of the file from disk into the buffer, and then send the contents of the buffer to Socket. However, because the Linux's iUnip O operation is buffered by default, iUnip O is used.

The two system calls, Read and Write, are mainly used, and we don't know what the operating system does in it. In fact, multiple copies of the data have occurred in the above Imap O operation.

When an application accesses a piece of data, the operating system first checks whether the file has been accessed recently and whether the contents of the file are cached in the kernel buffer.

If so, the operating system copies the contents of the kernel buffer to the user space buffer specified by buf directly according to the buf address provided by the Read system call.

If not, the operating system first copies the data on the disk to the kernel buffer, which currently relies mainly on DMA for transmission, and then copies the contents of the kernel buffer to the user buffer.

Next, the Write system call copies the contents of the user buffer to the kernel buffer associated with the network stack, and finally Socket sends the contents of the kernel buffer to the network card.

Having said so much, it is better to look at the picture clearly:

Data copy

As you can see from the figure above, four copies of data have been produced, and even if DMA is used to handle communication with the hardware, CPU still needs to process two copies of the data.

At the same time, there are many context switches between user mode and kernel state, which undoubtedly increases the burden of CPU.

In this process, we did not make any changes to the contents of the file, so copying data back and forth in kernel space and user space is undoubtedly a waste, and zero copy is mainly to solve this inefficiency.

What is zero copy technology (zero-copy)?

The main task of zero copy is to prevent CPU from copying data from one piece of storage to another.

The main thing is to use a variety of zero-copy technologies to avoid letting CPU do a lot of data copy tasks, reduce unnecessary copies, or let other components do this kind of simple data transfer tasks, so that CPU is free to focus on other tasks. In this way, the use of system resources can be made more efficient.

Let's go back to the example above, how can we reduce the number of data copies? An obvious focus is to reduce the amount of data being copied back and forth between kernel space and user space, which introduces a type of zero copy: data transfer without going through user space.

Use mmap

One way we can reduce the number of copies is to call mmap () instead of the read call:

Buf = mmap (diskfd, len); write (sockfd, buf, len)

The application calls mmap (), and the data on the disk passes through the kernel buffer that DMA is copied, and then the operating system shares this kernel buffer with the application, so that there is no need to copy the contents of the kernel buffer to user space.

The application calls write (), and the operating system copies the contents of the kernel buffer directly into the Socket buffer, all of which happens in kernel state. Finally, the Socket buffer sends the data to the network card.

Again, looking at the picture is simple:

Mmap

Using mmap instead of Read significantly reduces one copy, which undoubtedly improves efficiency when the amount of copy data is large.

But there is a price to use mmap. When you use mmap, you may encounter some hidden traps.

For example, when your program map a file, but when the file is truncated by another process (truncate), the Write system call will be terminated by the SIGBUS signal for accessing the illegal address.

By default, the SIGBUS signal will kill your process and generate a coredump, which will result in a loss if your server is suspended in this way.

Usually we use the following solutions to avoid this problem:

① establishes a signal processing program for SIGBUS signals.

When a SIGBUS signal is encountered, the signal processor simply returns, the Write system call returns the number of bytes that have been written before it is interrupted, and errno is set to success, but this is a bad way to handle it, because you don't have the core of the problem.

② uses file lease locks

Usually we use this method, using a lease lock on the file descriptor, and we apply for a lease lock from the kernel for the file.

When other processes try to truncate the file, the kernel sends us a real-time RTSIGNALLEASE signal telling us that the kernel is breaking the read-write lock you have attached to the file.

In this way, your Write system call will be interrupted before the program accesses illegal memory and is killed by SIGBUS. Write returns the number of bytes that have been written and sets errno to success.

We should lock the mmap file before and unlock it after manipulating the file:

If (fcntl (diskfd, F_SETSIG, RT_SIGNAL_LEASE) =-1) {perror ("kernel lease set signal"); return-1;} / * l_type can be F_RDLCK F_WRLCK lock * / * l_type can be F_UNLCK unlock * / if (fcntl (diskfd, F_SETLEASE, l_type)) {perror ("kernel lease set type"); return-1;}

Use sendfile

Starting with version 2.1 of the kernel, Linux introduced sendfile to simplify operations:

# include ssize_t sendfile (int out_fd, int in_fd, off_t * offset, size_t count)

The system call sendfile () passes the file contents (bytes) between the descriptor infd that represents the input file and the descriptor outfd that represents the output file.

The descriptor outfd must point to a socket, and the file that infd points to must be mmap.

These limitations limit the use of sendfile so that sendfile can only pass data from files to sockets and vice versa.

The use of sendfile not only reduces the number of data copies, but also reduces context switching, and data transfer always occurs only in kernel space.

Sendfile system call procedure

What happens if some other process truncates the file when we call sendfile? Assuming that we do not set any signal handlers, the sendfile call simply returns the number of bytes it has transferred before it is interrupted, and errno will be set to success.

If we lock the file before calling sendfile, the behavior of sendfile is still the same as before, and we will also receive a signal from RTSIGNALLEASE.

So far, we have reduced the number of data copies, but there is still one copy, which is the copy of the page cached to the Socket cache. Then can you omit this copy, too?

With the help of hardware, we can do it. Previously, we copied the page cached data into the Socket cache.

In fact, we just need to pass the buffer descriptor to the Socket buffer and pass the data length, so that the DMA controller can package and send the data in the page cache directly to the network.

To sum up: the sendfile system call uses the DMA engine to copy the contents of the file to the kernel buffer, and then adds the buffer descriptor with file location and length information to the Socket buffer.

This step does not copy the data from the kernel to the Socket buffer, but the DMA engine copies the data from the kernel buffer to the protocol engine, avoiding the last copy.

Sendfile with DMA

However, this function of collecting copies requires hardware and driver support.

Use splice

Sendfile is only suitable for copying data from a file to a socket, limiting its use.

Linux introduced splice system calls in version 2.6.17 to move data in two file descriptors:

# define _ GNU_SOURCE / * See feature_test_macros (7) * / # include ssize_t splice (int fd_in, loff_t * off_in, int fd_out, loff_t * off_out, size_t len, unsigned int flags)

The splice call moves data between two file descriptors without the need for data to be copied back and forth between kernel space and user space.

He copies len-length data from fdin to fdout, but one side must be plumbing equipment, which is some of the limitations of splice at present.

The flags parameter has the following values:

SPLICEFMOVE: try to move data instead of copying it. This is just a hint for the kernel: if the kernel cannot move data from pipe or if pipe's cache is not a full page, you still need to copy the data.

There are some problems with the initial implementation of Linux, so this option does not work from 2.6.21, and a later version of Linux should implement it.

SPLICEFNONBLOCK:splice operations are not blocked. However, if the file descriptor is not set to the unblockable mode of I _ splice O, then the call to the file may still be blocked.

SPLICEFMORE: there will be more data later in the splice call.

The splice call takes advantage of the pipe buffer mechanism proposed by Linux, so at least one descriptor should be a pipe.

The above zero-copy technologies are all implemented by reducing data copying in user space and kernel space, but sometimes data must be copied between user space and kernel space.

At this point, we can only work on the timing of copying data in user space and kernel space.

Linux usually uses copy-on-write (copy on write) to reduce system overhead, which is often called COW.

This is the end of "how to achieve zero copy of Linux". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.