How to use DPDK GRO and GSO to improve the performance of Network applications 07/04 Update SLTechnology News&Howtos

How to use DPDK GRO and GSO to improve the performance of Network applications

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

How to use DPDK GRO and GSO to improve the performance of network applications, I believe that many inexperienced people do not know what to do. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

At present, there are a large number of network applications that only deal with the packet header and do not operate the data payload part, such as firewalls, TCP/IP protocol stacks and software switches. For this kind of network applications, the overhead generated by header processing (called "per-packet overhead") accounts for most of the overall overhead. Therefore, how to reduce the overhead of header processing is the key to optimize the performance of this kind of applications.

The most direct way to reduce header processing overhead: reduce the number of packets

How to reduce the number of packages?

● increases Maximum Transmission Unit (MTU). In the case of a certain amount of data, packets using large MTU can carry more data, thus reducing the total number of packets. However, the MTU value depends on the physical link, and we cannot guarantee that all links that the packet traverses use a large MTU.

● makes use of network card features: Large Receive Offload (LRO), UDP Fragmentation Offload (UFO) and TCP Segmentation Offload (TSO). As shown in figure 1, LRO merges TCP packets received from physical links (such as 1500B) into longer TCP packets (such as 64KB); UFO and TSO split long data payload UDP and TCP packets (such as 64KB) sent by upper-layer applications into shorter packets (such as 1500B) to meet the MTU restrictions of the physical link. By merging and splitting packets on the network card, the upper layer applications can handle the greatly reduced number of large packets without any CPU overhead. However, LRO, TSO, and UFO usually only handle TCP and UDP packages, and not all network cards support these features.

● package merging (Generic Receive Offload,GRO) and package splitting (Generic Segmentation Offload,GSO). Compared with the former two methods, GRO and GSO have two advantages: first, it does not depend on physical links and network cards; second, it can support more protocol types, such as VxLAN and GRE.

Figure 1. How LRO, UFO, and TSO work

To help DPDK-based applications such as Open vSwitch reduce header processing overhead, DPDK supports GRO and GSO on 17.08 and 17.11, respectively. As shown in figure 2, GRO and GSO are two user libraries in DPDK that the application calls directly for package merging and sharding.

Figure 2. DPDK GRO and DPDK GSO

1. GRO library and GSO library structure

Figure 3 depicts the structure of the GRO library and the GSO library. Depending on the packet type, the GRO library defines different GRO types. Each GRO type is responsible for merging one type of packet, such as TCP/IPv4 GRO processing TCP/IPv4 packets. Similarly, the GSO library defines different GSO types. The GRO library and the GSO library send the input data packets to the corresponding GRO and GSO types for processing according to the packet_ type domain and ol_flags domain of MBUF, respectively.

Figure 3. Framework of GRO library and GSO library

2. How to use GRO library and GSO library?

Using the GRO and GSO libraries is simple. As shown in figure 4, you only need to call a function to merge and fragment the package.

Figure 4. Code example

To support different user scenarios, the GRO library provides two sets of API: lightweight mode API and weight mode API, as shown in figure 5. Lightweight mode API is used in scenarios where a small number of packets need to be merged quickly, while weight mode API is used in scenarios where packet merging needs fine-grained control and a large number of packets need to be merged.

Figure 5. Lightweight mode API and weight mode API

3. The packet merging algorithm of DPDK GRO

Algorithm challenge

● in the high-speed network environment, the high-cost packet merging algorithm is likely to lead to packet loss of the network card.

● packet disorder ("Packet Reordering") increases the difficulty of package closure. For example, Linux GRO cannot merge disordered packets.

This requires DPDK GRO's packet merging algorithm:

Light enough to adapt to the high-speed network environment

Ability to merge disordered packages

Packet merging algorithm based on Key

To address the above two challenges, DPDK GRO uses a Key-based pooling algorithm, and its process is shown in figure 6. For newly arrived packets, they are first classified according to the flow ("flow"), and then merged by finding adjacent packets ("neighbor") in the stream in which they are located. If a matching stream cannot be found, a new stream is inserted and the packet is stored in the new stream. If a neighbor cannot be found, the packet is stored in the corresponding stream.

The packet merging algorithm based on Key has two characteristics. First of all, it is a very lightweight way to accelerate the merging of packets through flow classification; secondly, saving packets that cannot be merged (such as disordered packets) makes it possible to merge them later, so the impact of packet disorder on packet merging is reduced.

Figure 6. The flow of package merging algorithm based on Key

4. Slicing strategy of DPDK GSO

Slicing process

As shown in figure 7, there are three steps to slice a packet. First, the data load of the packet is divided into many smaller parts; second, the header is added to each part of the data load (the newly formed packet is called GSO Segment); finally, the header is updated for each GSO segment (such as TCP Sequence Number).

Figure 7. GSO sharding process

The structure of GSO Segment

The easiest way to generate a GSO Segment is to copy the packet header and the data payload part. However, frequent data copying will degrade the performance of GSO, so DPDK GSO uses a zero-copy-based data structure-Two-part MBUF-- to organize GSO Segment. As shown in figure 8, a Two-part MBUF consists of a Direct MBUF and multiple Indirect MBUF. Direct MBUF is used to store packet headers, and Indirect MBUF is similar to a pointer to the data payload part. With Two-part MBUF, to generate a GSO Segment, you only need to copy the shorter packet headers, but not the longer data payload parts.

Figure 8. Structure of Two-part MBUF

Status of GRO and GSO libraries

At present, the GRO library is still in its infancy, and only provides packaging support for the most widely used TCP/IPv4 packets. The GSO library supports richer package types, including TCP/IPv4, VxLAN, and GRE.

After reading the above, have you mastered how to use DPDK GRO and GSO to improve the performance of network applications? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.