Network shunt | 100GbpsDPI technology based on composite storage 10/27 Update SLTechnology News&Howtos

Network shunt | 100GbpsDPI technology based on composite storage

2025-10-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >

Shulou(Shulou.com)06/01 Report--

Network shunt | 100Gbps DPI technology based on composite storage

Network shunt | background and requirements

At present, with the development of firewall, * * detection system, high-speed network management and control, CDN, operator signaling analysis, spam classification and other fields, the requirements of deep message inspection (Deep Packet Inspection, referred to as DPI) and classification for high-speed links are becoming higher and higher. On the one hand, the bandwidth is getting higher and higher, on the other hand, the detection features are becoming more and more complex, and there are more and more kinds of features, which puts forward very high requirements for the development of DPI technology.

Deep message detection is a process of using a series of predefined rules to match the packet load (not just the header) at the stream level and determine the action to be taken on the message according to the matching results. Composite storage technology

At present, in the field of DPI, two special modes are mainly used, namely keywords and regular expressions. Because the high-performance regular expression matching technology is not yet practical, keywords are still widely used (commercial regular expression matching technology has not exceeded 10Gbps), and even most DPI systems are still unable to complete full-message keyword matching over 10Gbps links. However, deep detection can only be performed on some bytes of the header of the message (such as the first 32 bytes or 64 bytes of the message payload). The performance problem is still the primary challenge for DPI. In order to improve the processing power of DPI, the following three technologies are commonly adopted in industry and academia:

(1) rely on higher-performance servers or server clusters to accelerate DPI throughput. Because it is very convenient to perform multi-machine or multi-core load balancing on the message stream, this method can easily achieve high processing performance, but the disadvantage is that the cost is relatively high.

(2) rely on better software algorithms to improve matching performance. At present, the research community generally uses AC algorithm as the basis to optimize performance, mainly WU-MANBER, SBOM, one-time multi-byte (commonly known as multi-step), BloomFilter and so on, but the state transition before and after the DPI process is related, that is, the address of the next visit is closely related to the status of the last visit and the content of the current text section. Limited by the performance of the memory itself (mainly DDR), the space for software optimization is very limited.

(3) rely on various matching engines of multicore NPU to accelerate. At present, two foreign multicore NPU giants Broadcom and Cavium have built-in DPI acceleration engines on their NPU. For example, Cavium's built-in HFA engine claims that its single NPU can reach the processing capacity of 24Gbps, but there is a huge difference between the measured performance and the claimed performance. Especially when configuring regular expression rules, the rules with wildcards will have a significant impact on their performance.

Always, although the research community and industry are very concerned about deep message detection, the current development of DPI technology still can not catch up with the practical requirements of related application fields.

In deep message detection (DPI), keywords or regular expression features should be compiled into finite state automata (Finite State Automata,FSA), and the state table of FSA should be configured in memory. In the matching process, each byte of the message needs to be looked up at least once to obtain the status address to be accessed next time. The speed of matching depends on the number of memory accesses and the delay of each memory access, and for a given message, the number of memory accesses is equal to the packet load length. Therefore, in order to improve the speed of matching, it is necessary to minimize the delay of each memory access.

In a lightweight network, the network link rate is low. If the number of rules is relatively small, each rule can be compiled into a FSA, and the state table can be configured in high-speed memory to achieve a higher matching speed. However, with the rapid increase of network bandwidth, 10G-bit networks have been applied to campus networks, and the number of rules has increased to hundreds or even thousands. The solution of compiling each rule into a single FSA no longer meets the performance requirements. If all the rules are compiled into a single FSA, a state explosion may occur. The size of the state table may exceed 100 gigabytes, which far exceeds the current capacity of high-speed memory, and can only be configured in low-speed memory such as external disk, and the memory access delay is greatly improved.

In fact, the key of DPI technology is the performance of memory access, especially the performance of random memory access. If a storage structure can be designed, which can not only support high random access performance (such as more than dozens of Gbps), but also have relatively large capacity (such as tens of megabytes), then through good state table data structure optimization, state table access can be relatively aggregated. Then optimize the memory access process, such as pipelined memory access, Bank interleaving, parallel memory access and other measures to further improve the efficiency of accessing the state table, then high-performance DPI can be achieved.

The high-performance DPI technology developed by Hunan Rong Teng Network Innovation team with the support of the National Natural Science Foundation draws lessons from the Cache structure of the computer system. In the computer system, because of the locality principle, the algorithms such as first-in, first-out, least use and so on can be used to replace the Cache, so that the hit rate of FIFO is higher. However, in deep message detection, the content of the message is completely random, so it is difficult to predict which state the next byte content will turn to. Selecting the frequently accessed states to be stored in high-speed memory is the key to improving performance. Through the unique Markov prediction technology, Rong Teng solves the prediction problem of state access very well.

Figure 1 Composite Storage matching engine

The whole matching engine adopts two or even three layers, and solves the contradiction between performance and storage capacity through composite storage and parallel and pipelining technology. it can achieve high performance through the parallelism of the first-level matching engine. it can also achieve large capacity state table space through secondary storage. This structure is suitable for both keyword matching and regular expression matching.

Production and Application of Composite Storage Technology | Network Divider

In order to extend the composite storage technology to practical applications, Rong Teng developed two systems: PET160S with independent chassis and CNT16S based on ATCA, in which the PET160S system can achieve the matching performance from 70Gbps to 90Gbps under 16K keywords, while the CNT16S system can achieve the performance of 40Gbps under 4K regular expressions. This kind of device can label the message at the flow level, and the message that matches the rule will hit the hit rule number in the message header, so that the back-end analysis system can quickly perform the corresponding detection task according to the corresponding label.

Through our analysis, supplemented by the software-hardware integrated flow meter technology, the overall message processing capacity is generally 4 times the performance of the core matching engine. That is, if the core matching engine can achieve the performance of n Gbps, the overall message processing capacity can be about 4n Gbps, that is, the PET160S system has completed the keyword DPI capability that can meet the full bandwidth of bi-directional 100Gbps Ethernet, while CNT16S can also meet the regular expression DPI requirements of bi-directional 100Gbps Ethernet under real network conditions (under real network conditions, the uplink and uplink traffic will not exceed 200 Gbps. 80%).

At present, the DPI accelerator card based on PCI-E is being developed. It is expected to achieve the keyword matching performance of about 40Gbps and the regular expression matching performance of 20Gbps on a single PCI accelerator card in the near future, and provide hardware acceleration for firewall, * detection system, high-speed network management and control, CDN and operator signaling analysis in the form of hardware accelerator card.

Concluding remarks | Network shunt

Keyword matching and regular expression matching are the key technologies of deep message detection (DPI). The complexity of rule set leads to a sharp increase in performance and capacity requirements for memory access. The scale of the state table far exceeds the capacity of high-speed memory. Through the composite storage technology, the performance problem of DPI can be well solved and well integrated with the existing products that need DPI acceleration.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.