How did P4 quickly evolve from IPv4 to IPv6 when it met NAT64,UCloud? 05/06 Update SLTechnology News&Howtos

How did P4 quickly evolve from IPv4 to IPv6 when it met NAT64,UCloud?

2025-05-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

With the development of IPv4, there are many defects, such as address exhaustion, difficult to guarantee security and quality of service, route expansion and so on. These problems will greatly restrict the development of cloud computing and other related IT industries. IPv6 can solve these defects of IPv4 very well because of its larger address space and higher security.

UCloud began to develop IPv6 conversion for public network portals in the first half of 2018. Relying on NAT64 technology and programmable P4 switch, IPv6 conversion service for UCloud public network portals has been successfully launched. The function of the product is simple and easy to use. After applying for EIP, you can turn on IPv6 conversion with one button and provide access to IPv6 without any modification. At present, UCloud IPv6 conversion service has been successfully used in CVM, EIP, load balancer, container cluster, fortress machine and other products. A single cluster in the region (16 NAT64 servers and 4 P4 switches) can achieve a maximum of 3.2m CPS and 1.6G concurrent connections, and can be smoothly expanded in the future evolution process.

Strategic steps of UCloud IPv6 evolution

However, we still need to clearly understand that the transformation of network infrastructure is not achieved overnight, which not only involves the overcoming of technical problems, but also a very huge engineering problem.

And the most important thing is to slowly migrate the user's business to IPv6 without affecting the user's existing business. It is based on this consideration that UCloud has developed the following strategies for the evolution from IPv4 to IPv6:

1. Complete the IPv6 conversion service of the Internet portal in 2018, so that products with more than 50% UCloud can support IPv6. Customers only need to turn on the IPv6 conversion service in EIP, so that the business can obtain the ability to provide IPv6 access services without changing any business, and realize the smooth connection between the business and the IPv6 network.

2. The IPv6 transformation of the management network was completed in 2018, so that the cloud products relying on the management network, such as host * * detection, container image library and other products support IPv6.

3. The main products of UCloud to be completed in 2019 support IPv6, among which important products such as VPC and ULB (UCloud load balancing) will support IPv6 before 2019 Q2 and have the ability to actively access IPv6 network.

4. Complete the transformation of IDC dual stack in 2019, so that the data center supports IPv6 and provides complete IPv6 support.

In the process of IPv6 from technology to landing, UCloud has done a lot of work, but also encountered more challenges. Next, this paper will introduce in detail the implementation and optimization evolution of UCloud IPv6 transformation service from a technical point of view.

UCloud IPv6 conversion service

◆ NAT64 and its Technical Challenge

In terms of implementation technology, UCloud uses stateful NAT64 technology to implement IPv6 translation service. NAT64 is an IPv6 translation mechanism that facilitates communication between IPv6 and IPv4 hosts through network address translation (NAT). The NAT64 gateway needs at least one IPv4 address and an IPv6 network segment containing a 32-bit address space to complete the translation between IPv4 and IPv6 protocols.

The NAT64 gateway creates a mapping between IPv6 and IPv4 addresses, which can be configured manually or determined automatically.

As shown in the figure below, the UCloud IPv6 translation function is implemented based on NAT64, which can generate an IPv6 address for the EIP of a user's existing IPv4. Without modifying the existing IPv4 network and corresponding services, the corresponding cloud resources and services can be accessed by the public network IPv6, and users' cloud resources and services can be accessed by IPv4 and IPv6 at the same time.

The transition between IPv6 and IPv4 is a kind of stateful transition. Considering the stability and expansibility requirements of the whole system, there are two key technical points in the implementation of IPv6 transformation service.

High availability. Since the IPv6 transition service is stateful, it is necessary to ensure that when the nodes in the cluster change, the existing connections will not be affected (precisely, the impact is no more than 1), where n identifies the number of back-end nodes)

Security protection. Because the IPv6 transition service is stateful, when you encounter malicious * * (such as DDoS), it is easy to cause the server to be hung up and make the service unavailable. Therefore, a certain DDoS protection capability is very important to the whole system.

◆ system architecture

Based on the above key points, we began to design a system architecture for IPv6 conversion based on NAT64 and P4 switches, as shown in the following figure, where NAT64 Access is implemented using P4 switches and high availability is achieved through NAT64 Access consistent Hash. At the same time, the speed limit of CPS is carried out in NAT64 Access to realize DDoS protection.

NAT64 Access and physical switch 1 form a layer 3 network and announce a / 96 IPv6 address field to physical switch 1 through BGP as the IPv6 prefix of the region. POP1 is the same as the address field announced by Access in POP2, which realizes load sharing and disaster recovery at POP point level. Similarly, there is also load sharing and disaster recovery between the two Access in the POP point.

Access forms a layer 2 network with physical switch 2 and NAT64 server. The NAT64 server announces VIP,Access to Access through BGP to obtain the next hop information (MAC address) corresponding to VIP. When the incoming IPv6 message from Internet is received, the MAC address of the message is set as the MAC address of a NAT64 server, so that the message can be sent to a specific NAT64 server. At the same time, the southbound NAT64 server needs to declare a source address pool to physical switch 4 to achieve backhaul reachability.

It is important to note that in the actual deployment, physical switches 2 and 1 are usually deployed in one, forming a NAT64 Access that exists in a side-by-side manner. Taking the CVM as an example, the working mechanism of the entire system is briefly described through the business process:

Business proc

Because a source address pool is configured on each NAT64 server and does not overlap (where IPv4_1 is an address in the source address pool and IPv4_2 corresponds to EIP) and the address pool is advertised southward, the response message of the CVM (the destination address is the address in the source address pool, that is, IPv4_1) can be routed to the corresponding NAT64 server.

◆ when P4 meets NAT64

NAT64 Access supports high availability

Through the above system architecture, it can be found that POP point-level load sharing and disaster recovery can be achieved through physical switches, but in fact, the key for the system to achieve high availability is that when the state of the NAT64 service node changes (such as capacity expansion or a node down), the system needs to ensure that the existing connection is not destroyed, which requires NAT64 Access to support consistent Hash when selecting back-end nodes. So essentially the most important attribute of NAT64 Access is the consistent Hash gateway.

In the implementation of major cloud computing vendors, consistent Hash gateway implementation, DPDK is the current mainstream implementation scheme. But DPDK also has the following drawbacks:

DPDK-based applications can achieve high packet forwarding rate, but this is achieved through multi-server, multi-core load balancing. And load balancing algorithms are usually implemented by hardware switches or network cards, and can not be defined by software. If there is a single elephant flow in the network, which can not be well distributed by the load balancing algorithm of the hardware switch or network card, it will cause congestion on a single network line or a single CPU Core, which will have a great impact on the business.

With the evolution of network bandwidth from 10G to 25G, 40G, 50G and 100G, DPDK needs more powerful CPU to achieve line speed, and more powerful CPU is usually very expensive and disadvantageous. In particular, the higher the main frequency of a single Core, the more expensive the price, and the relationship between the increase of the main frequency and the price is non-linear.

Therefore, we finally decided to use P4 programmable switch (based on Barefoot Tofino chip implementation) to achieve NAT64 Access. In fact, UCloud began pre-research on P4 programmable switch as early as 2017, and there are already GW grayscale on-line based on P4 programmable switch. Compared to DPDK gateways, P4 programmable switches have many advantages:

1. Higher forwarding performance (1.8T~6.4T) than DPDK

two。 The forwarding performance is stable and is not affected by CPU Loading, etc.

3. Single line 100G, can avoid single line congestion

4.P4 has good openness and programmability.

5. Good biosphere, supporting P4 Runtime.

Maglev Hash

Selection and Verification of Maglev algorithm

When choosing the consistent Hash algorithm, we choose the Hash algorithm used in the Google Maglev project (hereinafter referred to as Maglev Hash). The core advantage of this algorithm is that it has extreme stability when the back-end service node changes. And the size of the Lookup table remains unchanged, which is very suitable for the P4 switch to host the Lookup table. (original paper: Maglev: A Fast and Reliable Software Network Load Balancer)

According to the introduction of the consistent Hash in this paper, we can see that the Maglev Hash algorithm essentially designs the algorithm for each back end to fill the Empty Entry of the array Lookup table according to certain rules, ensuring that all back-end servers appear as many times as possible in the elements of the constructed Lookup table (in essence, according to the algorithm, the difference between the node with the largest number of occurrences and the node with the least number of occurrences is 1). Therefore, the extreme average performance can be achieved.

Although the Lookup table generated by the Maglev Hash algorithm has extreme average performance, it also has a drawback, that is, when the back-end service node changes, there will be a partial connection break (the ideal consistent Hash algorithm interrupts the connection ratio is 1pm N Maglev Hash may have slightly more than 1pm N).

In the Maglev project, the Connection Track table is used to make up for this defect, but Connection Track will bring a series of shortcomings, making the NAT64 Access stateful and easy to receive. It can be seen from the paper that when the size of the Lookup table is more than 100 times that of the back-end node, the connection break is less than 2%. However, 2% is still a relatively high ratio. With a rigorous attitude, we have carried out a series of tests and verification on the ratio of Lookup table size to backend nodes under the scenario of capacity expansion and reduction (which also corresponds to a NAT64 server down machine).

The above figure corresponds to the increase and decrease of back-end nodes, respectively. Through the above tests, it can be found that the stability of the algorithm is slightly poor when the size of the Lookup table is small. Taking the above test as an example, when the size of the Lookup table is 1024, nearly 2% of the connections will be affected in the expansion and reduction scenarios (specifically manifested as entry changes), which is basically consistent with the conclusion of the Maglev paper.

However, with the increase of the size of the Lookup table, the impact on the existing joins becomes smaller and smaller, and finally approaches to 1. Specific to the above two figures, when the size of the Lookup table is more than 2000 times that of the back-end service node, the proportion of connection interruptions is less than 0.01%. However, without connection track, the entire NAT64 Access is stateless, which greatly improves the stability of NAT64 Access and greatly reduces the implementation complexity.

How NAT64 Access works

A Lookup table is hosted on the NAT64 Access in the following format:

It is important to note that the Lookup table here is actually made up of several Lookup tables in Maglev, distinguished by vip.

The specific working mechanisms are as follows:

Different NAT64 clusters announce different VIP to NAT64 Access through BGP. NAT64 Manager obtains routing table information and neighbor table information of NAT64 Access through GRPC, and obtains each VIP and corresponding next-hop MAC address information. Then iterate through all the VIP and call Maglev Hash Engine according to the next hop information of each VIP to generate the corresponding entry list of each cluster (the specific value is the MAC address of each NAT64). All the entry list and VIP form the above Lookup table.

When the message is received on the data surface, the VIP will be queried according to EIP (realized by other tables and corresponding logic of the control plane, and will not be expanded here), and then the entry index will be obtained by calculating the source IP, destination IP, source port, destination port and calling the Hash function of P4 language. Finally, the Lookup table is matched according to VIP and entry index, and the destination MAC is set. Thus, the selection of the back-end service node is completed.

NAT64 Manager will continuously monitor the routing table and neighbor table of NAT64 Access. Once the next hop of a VIP is changed (such as a capacity expansion scenario or a NAT64 Down), Maglev Hash Engine will be called again to regenerate the corresponding part of the entry in the Lookup table corresponding to the VIP, and the corresponding entry will be modified through GRPC to achieve fast response to node changes.

NAT64 Access DDoS security protection

Because the IPv6 conversion service itself is stateful, which means that it is possible to be subject to DDoS *, we impose a PPS speed limit on each EIP for TCP SYN packets on NAT64 Access. Because UCloud has strong security protection and DDoS detection and cleaning in public network access, the speed limit of SYN packets implemented on NAT64 Access is only as a supplement and secondary protection. However, the advantage is that the speed limit is directly carried out without detection and analysis, which can shorten the unavailability time of NAT64 services in extreme * * scenarios (the complete DDoS protection of the security center usually involves steps such as detection and analysis, with a certain degree of lag).

At present, the rate of SYN packets in a single EIP is limited to 50000, and packets will be lost when it exceeds 50000. This parameter is adjustable, and if the user needs a very large CPS at the business level, the UCloud-related technical support personnel can also assist in the adjustment.

P4 table configuration optimization

The Tofino chip contains 4 pipeline, and each pipeline contains 12 stage. At present, the mainstream scenario is that all pipeline use the same table configuration, even with the same P4 code. But once the two tables depend on each other, there is no way to put the same stage, which is determined by the execution logic of the underlying chip.

Considering the complexity of business logic, the data side usually needs to define a lot of tables to complete the entire business logic, and these tables are highly dependent on each other. Therefore, the stage is used up in the actual coding process, but the resource utilization of each stage is very low. For our project, the low resource utilization will lead to a limited number of EIP that a NAT64 Access can support. There are usually three solutions to this problem:

Optimizing table configuration or modifying certain business logic to reduce the interdependence between tables can greatly improve the utilization of stage resources.

Although ingress and egress share stage, there is no hardware dependency between the tables of ingress and egress. So split the business logic, one part on ingress and the other on egress. This scheme is relatively simple and feasible, and usually achieves obvious results.

Pipeline serial, split business logic, each pipeline places a part of business logic, and different pipeline transfer information through custom metadata. This scheme is also an effective scheme, and can improve the overall table space of Tofino, it can be predicted that there may be many such applications in the future.

In the NAT64 project, we adopted a combination of 1 and 2, and after optimization, the resource utilization rate reached about 70% (only about 30% before optimization). The following figure shows the resource utilization diagram and Table utilization chart of our optimized Tofino chip.

System performance test

After the construction of the system, we have carried out a complete performance test for a single NAT64 (NAT64 server is configured with CPU:32 core; memory: 64GB; network card: X710 10Gb * 2) server. Client is IPv6, and the server side is IPv4 two-way udp data flow. The client sends the request to the server and the server replies back to the client. The test results of the most critical metrics of CPS and concurrent connections are as follows:

CPS test results:

Test results of the number of concurrent connections:

We initially launched 16 NAT64 servers in a single set, so we can achieve a maximum of 3.2m CPS and 1.6G concurrent connections in a single region. In addition, the whole system supports smooth seamless expansion and arbitrary addition of NAT64 Access and NAT64 servers.

At present, the UCloud IPv6 conversion service is in free public testing. You are welcome to use it.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.