How to conduct a comprehensive analysis of DPDK 07/09 Update SLTechnology News&Howtos

How to conduct a comprehensive analysis of DPDK

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article will explain in detail how to conduct a comprehensive analysis of DPDK. The content of the article is of high quality, so the editor will share it for you as a reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.

Comprehensive Analysis of High performance Network Technology by DPDK

With the sudden rise of the cloud computing industry and the continuous innovation of network technology, more and more network equipment infrastructure gradually converges to the architecture based on general processor platform, from traditional physical network to virtual network, from flat network structure to hierarchical network structure based on SDN, all reflect this innovation and integration.

This not only makes the network more controllable and cost-effective, but also supports the performance needs of large-scale users or applications, as well as the processing of massive data. The reason is actually an inevitable result of the continuous breakthrough of high-performance network programming technology with the evolution of network architecture.

Why would you say that? What are the disadvantages of data transfer based on OS kernel?

1. Interrupt processing. When a large number of packets arrive in the network, there will be frequent hardware interrupt requests, which can interrupt the execution of previous low-priority soft interrupts or system calls. If such interruptions are frequent, it will result in higher performance overhead.

2. Memory copy. Normally, a network packet from the network card to the application needs to go through the following process: the data is transferred from the network card to the buffer opened by the kernel through DMA, and then copied from the kernel space to the user state space. In the Linux kernel protocol stack, this time-consuming operation even accounts for 57.1% of the whole packet processing flow.

3. Context switching. Frequently arriving hardware interrupts and soft interrupts may preempt the operation of system calls at any time, which will result in a lot of context switching overhead. In addition, in the multi-threaded server design framework, scheduling between threads will also generate frequent context switching overhead. Similarly, the energy consumption of lock contention is also a very serious problem.

4. Local failure. Nowadays, mainstream processors have multiple cores, which means that a packet may be processed across multiple CPU cores. For example, a packet may be interrupted in cpu0, kernel processing in cpu1, and user processing in cpu2. This will easily lead to CPU cache invalidation and local failure. If it is a NUMA architecture, it will result in memory access across NUMA, and performance will be greatly affected.

5. Memory management. The traditional server memory page is 4K, in order to improve the memory access speed and avoid cache miss, we can increase the entries of the mapping table in cache, but this will affect the retrieval efficiency of CPU.

Summing up the above problems, we can see that the kernel itself is a very big bottleneck. The obvious solution is to find a way to bypass the kernel.

Discussion on the solution

In view of the above disadvantages, the following technical points are put forward to discuss.

1. The control layer is separated from the data layer. The tasks such as packet processing, memory management and processor scheduling are transferred to user space, while the kernel is only responsible for processing part of the control instructions. In this way, there is no such problems as system interrupt, context switching, system call, system scheduling and so on.

2. Use multi-core programming technology instead of multi-threading technology, and set the affinity of CPU, bind threads and CPU cores one to one, and reduce scheduling switching between each other.

3. For the NUMA system, try to make the CPU core use the memory of the NUMA node to avoid cross-memory access.

4. Use large pages of memory instead of ordinary memory to reduce cache-miss.

5. Adopt lock-free technology to solve the problem of resource competition.

After the research of many previous pioneers, there have been many excellent high-performance network data processing frameworks integrated with the above technical solutions in the industry, such as 6wind, windriver, netmap, dpdk and so on. Among them, Intel's dpdk stands out in many schemes.

Dpdk provides library function and driver support for efficient packet processing in user space under Intel processor architecture. It is different from Linux system for the purpose of universal design, but focuses on high-performance packet processing in network applications.

In other words, dpdk bypasses the processing of data packets by the Linux kernel protocol stack and implements a set of data planes in user space to send and receive and process data packets. From the kernel's point of view, dpdk is an ordinary user-mode process, which compiles, connects and loads in the same way as an ordinary program.

The Breakthrough of dpdk

Compared with the traditional kernel-based network data processing, dpdk has made a major breakthrough in the network data flow from the kernel layer to the user layer. Let's first take a look at the difference between the traditional data flow and the network flow in dpdk.

Traditional Linux kernel network data flow:

Copy

Hardware interrupt-> Distribution of packets to kernel threads-> Software interrupts-- > Kernel threads process packets in the protocol stack-- > notify the user layer after processing-- > Network layer-- > Logic layer-- > Business layer

Dpdk network data flow:

Copy

Hardware interrupt-> abandon interrupt process user layer picks up packets through device mapping-> enter user layer protocol stack-- > logic layer-- > business layer

Let's take a look at what breakthroughs dpdk has made.

The blessing of UIO (Istroke O technology in user space).

Dpdk can bypass the kernel protocol stack, in essence, thanks to UIO technology, through UIO can intercept interrupts and reset the interrupt callback behavior, thus bypassing the subsequent processing flow of the kernel protocol stack.

In fact, the implementation mechanism of UIO devices is to expose the file interface to user space. For example, when you register a UIO device uioX, there will be a file / dev/uioX. Reading and writing to this file is reading and writing to the memory of the device. In addition, the control of the device can also be accomplished by reading and writing each file under / sys/class/uio.

Memory pool technology

Dpdk implements a set of exquisite memory pool technology in user space. The memory interaction between kernel space and user space does not copy, only control transfer. In this way, when sending and receiving packets, the overhead of memory copy is reduced.

Large page memory management

Dpdk implements a set of large page memory allocation, use and release API, the upper application can easily use API to use large page memory, but also compatible with ordinary memory applications.

Unlocked ring queue

Dpdk implements its own lock-free mechanism based on the lock-free ring buffer kfifo of the Linux kernel. Single producer / single consumer dequeue and multi-producer / multi-consumer dequeue operations are supported, which can not only reduce the performance but also ensure the synchronization of data during data transmission.

Poll-mode network card driver

The DPDK Nic driver completely abandons the interrupt mode and receives the packet based on polling, which avoids the interrupt overhead.

NUMA

The memory information provided by proc in dpdk memory allocation makes the CPU core use the memory close to its node as far as possible, which avoids the performance problem of remote access to memory across NUMA nodes.

CPU affinity

Dpdk uses the affinity of CPU to bind one or more threads to one or more CPU, so that during thread execution, it will not be scheduled at will. On the one hand, it reduces the overhead caused by frequent switching between threads, on the other hand, it avoids the local invalidation of CPU cache, and increases the hit rate of CPU cache.

Multi-core scheduling framework

Based on the multi-core architecture, dpdk is generally divided into master and slave cores. The master core is responsible for initializing each module, and the slave core is responsible for specific business processing.

In addition to the above, dpdk has many technological breakthroughs, which can be summarized in the following diagram.

Application of dpdk

As an excellent user-space high-performance packet acceleration suite, dpdk has now been used as a "glue" module in a number of network data processing solutions to improve performance. Here are many applications.

Data side (virtual switch):

OVS

Open vSwitch is a multi-core virtual switch platform, which supports standard management interface, open and extensible programmable interface, and third-party control access.

VPP

VPP is a cisco open source high-performance packet processing framework that provides switching / routing capabilities that can be used as a virtual switch in a virtualized environment. In a SDN-like processing framework, it often acts as the data side. The research shows that the performance of VPP is better than that of ovs+dpdk, but it is more suitable for NFV and suitable for network module with specific function.

Lagopus

Lagopus is another implementation of multicore virtual switching, which is similar to OVS and supports a variety of network protocols, such as Ethernet,VLAN,QinQ,MAC-in-MAC,MPLS and PBB, as well as tunneling protocols such as GRE,VxLan and GTP.

Snabb

Snabb is a simple and fast packet processing toolkit.

Data side (virtual router):

OPENCONTRAIL

A virtual router integrated with SDN controller is now mostly used in OpenStack, combined with Neutron to provide one-stop network support for OpenStack.

CloudRouter

A distributed router.

Dpdk bypasses the Linux kernel protocol stack and accelerates data processing. Users can customize the protocol stack in user space to meet their own application needs. At present, there are many high-performance network frameworks based on dpdk. OVS and VPP are commonly used data plane frameworks, and mTCP and f-stack are commonly used user mode protocol stacks. Many large companies are using dpdk to optimize network performance.

On how to conduct a comprehensive analysis of DPDK to share here, I hope that the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.