Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

The usage of Linux Zero copy Technology

2025-02-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly explains "the usage of Linux zero copy technology". The explanation in this article is simple and clear, easy to learn and understand. Please follow the ideas of Xiaobian and go deep into it slowly to study and learn "the usage of Linux zero copy technology" together.

Why do we need zero copies?

Standard I/O interfaces for traditional Linux systems (read, write) is based on data copy, that is, the data is copy_to_user or copy_from_user. The advantage of doing so is to reduce disk I/O operations through the intermediate cache mechanism, but the disadvantage is also obvious. The copying of a large amount of data and frequent switching between user mode and kernel mode will consume a large amount of CPU resources and seriously affect the performance of data transmission. Some data shows that in the Linux kernel protocol stack, This copy takes up 57.1% of the total packet processing time.

2 What is a zero copy?

Zero copy is one solution to this problem, relieving CPU stress by avoiding copy operations as much as possible. Common zero-copy techniques under Linux can be divided into two categories: one is to remove unnecessary copies for specific scenarios; the other is to optimize the entire copy process. From this point of view, zero copy does not really achieve "0" copy, it is more an idea, many zero copy technologies are based on this idea to do optimization.

3. Several methods of zero copy

(1)Original data copy operation

Before we get started, let's take a look at what Linux's original data copy operation looks like. Suppose an application needs to read content from a disk file and send it over the network, like this:

while((n = read(diskfd, buf, BUF_SIZE)) > 0)

write(sockfd, buf , n);

Then the whole process needs to go through: 1)read copies the data from the disk file to the buffer opened by the kernel through DMA;2) copy the data from the kernel buffer to the user mode buffer;3)write copies the data from the user mode buffer to the socket buffer opened by the kernel protocol stack;4) copy the data from the socket buffer to the NIC through DMA.

It can be seen that at least four data copies occurred in the whole process, two of which were completed by DMA and hardware communication, and the CPU was not directly involved.

Method 1: User mode direct I/O

This method enables applications or library functions running in user mode to directly access hardware devices, data is directly transmitted across the kernel, and the kernel does not participate in any other work except the necessary virtual storage configuration work during the entire data transmission process. This method can directly bypass the kernel and greatly improve performance.

Defects:

1)This approach can only be applied to applications that do not require kernel buffer processing, and these applications typically have their own data caching mechanism in the process address space, known as self-caching applications, such as database management systems.

2)This method directly operates disk I/O. Due to the execution time gap between CPU and disk I/O, it will cause waste of resources. To solve this problem, it needs to be combined with asynchronous I/O.

Method 2: MMap

This method, using mmap instead of read, reduces the number of copy operations by one, as follows:

buf = mmap(diskfd, len);

write(sockfd, buf, len);

The application calls mmap, and the data in the disk file is DMA copied to the kernel buffer, which the operating system then shares with the application so that it does not have to copy to user space. The application calls write, and the operating system directly copies the data from the kernel buffer to the socket buffer, and finally copies it to the network card through DMA.

Defects:

1)mmap hides a trap. When mmap a file, if this file is intercepted by another process, then the write system call will be terminated by SIGBUS signal because of accessing illegal address. SIGBUS will kill the process and generate a coredump by default. If the server is terminated in this way, the loss may not be small.

This problem is usually solved using a file lease lock: first apply for a lease lock for the file, and when another process wants to truncate the file, the kernel sends a real-time RT_SIGNAL_LEASE signal to tell the current process that a process is trying to destroy the file, so that write is interrupted before it is killed by SIGBUS, returns the number of bytes written, and sets errno to success.

The usual practice is to lock before mmap and unlock it after operation:

Method 3: sendfile

Starting with Linux kernel version 2.1, Linux introduced sendfile, which also reduces one copy.

#include

ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count);

sendfile is a data transmission interface that occurs only in kernel mode, without user mode participation, and naturally avoids user mode data copying. It specifies that data is transferred between in_fd and out_fd, where it specifies that the file pointed to by in_fd must be mmappable and out_fd must point to a socket, i.e., data can only be transferred from file to socket and vice versa. sendfile doesn't get intercepted like mmap, it has exception handling.

Defects:

1) It is only applicable to applications that do not require user-mode processing.

Method 4: DMA assisted sendfile

The regular sendfile also has a copy operation of the kernel state. Can you also remove this copy?

The answer is this DMA assisted sendfile.

With the help of hardware, this method does not copy the data from the kernel buffer to the socket buffer, but copies the buffer descriptor. After completion, the DMA engine directly copies the data from the kernel buffer to the protocol engine, avoiding the last copy.

Defects:

1)In addition to the bugs in 3.4, hardware and driver support are required.

2) Only applies to copying data from files to sockets.

Method 5: Splice

Splice can be used to transfer data between any two file descriptors without limiting the scope of use of sendfile.

#define _GNU_SOURCE /* See feature_test_macros(7) */

#include

ssize_t splice(int fd_in, loff_t *off_in, int fd_out, loff_t *off_out, size_t len, unsigned int flags);

Splice, however, has limitations. It uses Linux's pipe buffering mechanism, so at least one of its two file descriptor parameters must be a pipe device.

Splice provides a flow control mechanism that blocks write requests with predefined watermarks, and experiments have shown that transferring data from one disk to another in this way increases throughput by 30 -70%, and CPU responsibility is reduced by half.

Defects:

1)The same applies only to programs that do not require user-state processing

2)At least one of the transport descriptors is a pipe device.

Method 6: Copy while writing

In some cases, kernel buffers may be shared by multiple processes, and if a process wants to write to this shared area, since write does not provide any locks, it will cause damage to the data in the shared area.

Copy-on-write, that is, when multiple processes share the same block of data, if one of the processes needs to modify the data, then it needs to copy it to its own process address space, which does not affect the operation of other processes on this piece of data, each process will copy when it needs to modify, so it is called copy-on-write. This approach reduces overhead to the extent that if a process never changes the data it accesses, it never needs to copy it.

Defects:

MMU support is required. MMU needs to know which pages in the process address space are read-only. When data needs to be written to these pages, an exception is issued to the operating system kernel, and the kernel allocates new storage space for the write demand.

Method 7: Buffer Sharing

This method completely rewrites I/O operations, because traditional I/O interfaces are based on data copy, to avoid copying, remove the original set of interfaces, rewrite, so this method is a more comprehensive zero-copy technology, at present a more mature solution is the first in Solaris implementation of fbuffer (Fast Buffer).

The idea behind Fbuf is that each process maintains a buffer pool that can be mapped to both the program address space and the kernel address space, and that the kernel and user share the buffer pool, thus avoiding copying.

Defects:

1)Managing shared buffer pools requires close cooperation between applications, network software, and device drivers

2)API rewrite, still in experimental stage.

(2)High Performance Network I/O Framework--netmap

Netmap is based on the idea of shared memory and is a high-performance framework for sending and receiving raw data packets. It was developed by Luigi Rizzo et al., which contains kernel modules and user mode library functions. The goal is to achieve high performance packet transfer between user mode and network card without modifying existing operating system software and without special hardware support.

Under the Netmap framework, the kernel has a data packet pool, and the data packets on the send ring/receive ring do not need to be dynamically applied. When data arrives at the NIC, when data arrives, a data packet is directly taken from the data packet pool, and then the data is placed in this data packet, and then the descriptor of the data packet is placed in the receive ring. Packet pools in the kernel, mapped to user space through mmap technology. The user mode program finally acquires the receive and send ring netmap_ring through netmap_if to acquire and send data packets.

Thank you for reading, the above is the "Linux zero copy technology usage" content, after the study of this article, I believe that we have a deeper understanding of the use of Linux zero copy technology, the specific use of the situation also needs to be verified by practice. Here is, Xiaobian will push more articles related to knowledge points for everyone, welcome to pay attention!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report