Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Design of data Center Network system with UCloud supporting 320000 servers in single availability Zone

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/03 Report--

In October 2018, the UCloud data center infrastructure network completed the implementation of the new V4 architecture. Since then, the newly built data center (hereinafter referred to as DC) has been fully upgraded to 25G/100G network, greatly improving DC capacity and DC interconnection performance. A single availability zone under V4 architecture can provide 320,000 server access ports, four times more than the previous V3 architecture. It also supports lossless network features and provides horizontal expansion and rolling upgrade capabilities for available area resources. Since its launch, the new architecture has effectively ensured the opening of UCloud Fujian GPU availability area and B/C/D expansion of Beijing II availability area.

Compared with cloud products that create rich user value through software flexibility, public cloud physical networks pay more attention to planning foresight and design rationality. The goal is simplicity, stability and efficiency. By providing an extremely reliable, one-dimensional addressable logical connectivity surface for the upper virtual network, it helps achieve the upper product's "software-defined everything" mission. The details of how we designed the DCN V4 architecture with this philosophy are detailed below.

UCloud DCN V3 Architecture Design

UCloud public cloud provides services to the outside world with an availability zone (hereinafter referred to as AZ) as the minimum resource pool unit, and an availability zone consists of one or more data centers. UCloud Data Center Infrastructure Network Architecture (hereinafter referred to as DCN) was upgraded to V3 architecture in 2016, as shown in the following figure:

Figure: UCloud DCN V3 Architecture

The V3 architecture is designed to:

Fully upgraded to 10G access and 40G interconnection;

Completely removed the stack, avoiding the disadvantages of stacking;

It adopts two-level CLOS and Spine-Leaf architecture to achieve a certain horizontal scalability.

The core switch of the data center is Spine, which provides standard BGP routing access, TOR/Border is Leaf; the gateway of the service server falls on TOR Leaf; the Border Leaf of DC connects the POP room of the metropolitan area network, realizing the interworking between DC and DC. One DC is an availability zone.

V3 solves the disadvantages of stacking and MC-LAG in V2 era, CLOS architecture has horizontal expansion capability, and unified access mode of the whole network improves network deployment efficiency.

After V3 was launched, UCloud launched its efforts to build overseas nodes, providing effective support for the rapid landing of Seoul, Tokyo, Washington, Frankfurt and other nodes in a short time.

New Challenges for V3 Architecture

In the past two years, with the rapid development of UCloud services and the maturity of 25G/100G network equipment, the services have put forward new requirements for network performance, and the V3 architecture has gradually shown some shortcomings, mainly as follows:

insufficient performance

The development of distributed computing, real-time big data, NVMeoF, etc. requires networks to provide greater bandwidth and lower latency, as well as quality of service guarantees.

Taking NVMeoF as an example, network storage has additional overhead on network device forwarding, transmission, and TCP/IP protocol stacks compared to traditional storage. The recent maturity of RDMA technology has greatly reduced TCP/IP stack overhead and improved IO performance. However, we found in practice that slight congestion under V3 architecture may cause a large number of RMDA packet retransmissions, occupying considerable bandwidth and causing service performance degradation. This bottleneck in network performance needs to be broken.

hypovolemia

Users often want unlimited resources in an availability zone that can be expanded. The horizontal expansion capacity of V3's two-level CLOS architecture is ultimately limited by the number of Spine device ports. A DC network can accommodate about 10,000 or 2,000 servers or 1,000 or 2,000 racks. A computer room can have tens of thousands or even hundreds of thousands of racks. Under the V3 architecture, multiple DC networks need to be built. DCNs are interconnected through POPs, which is not only difficult to improve performance, but also costly.

insufficient flexibility

The unified access mode of the whole network is convenient for large-scale wiring deployment, which really improves efficiency, but at the same time reduces flexibility. For example, some services require cluster server layer 2 to be reachable, and some services require classic networks to do Overlay... In short, uniform network planning cannot meet all mainstream business requirements.

Design and Optimization of DCN V4 Architecture

In order to solve the above problems, starting from the end of 2017, the team redesigned, selected and standardized the DCN architecture, and completed the DCN V4 complete solution in October 2018 and landed in the newly built data center. The overall architecture is as follows:

Figure: UCloud DCN V4 Architecture

In the new architecture, we have mainly made the following optimizations:

1. Hardware upgrade to 25G/100G platform

From the end of 2017 to the first half of 2018, the 25G/100G network equipment of various commercial switch manufacturers gradually matured, and the price of 25G/100G optical modules tended to be reasonable. At the same time, the business demands such as GPU, real-time big data and NVMeoF exploded, and the IO bottleneck shifted from the server to the network. So we started to upgrade the hardware from 10G to 25G.

Since the end of 2017, we have conducted selection, cross-testing and online small batches of mainstream 25G/100G products of various mainstream switch, optical module, optical fiber and server network card manufacturers. We have invested 8 months to cross-test more than 300 product combinations and finally determine the complete set of 25G/100G hardware products.

Fujian GPU Availability Zone, which was launched this month, utilizes this architecture to support 10G/25G physical networks at the same time. 25G network brings higher cluster computing efficiency, and the overall performance doubles compared with GPU cloud hosts provided by ordinary availability areas, which is very important for AI training scenarios that value absolute performance.

Figure: GPU Physical Cloud 10G/25G Gateway Cluster

2. Design of Level 3 CLOS

Figure: Level 2 CLOS

CLOS architecture requires that the next level of equipment needs to keep up with the full-mesh of the first level of equipment. Therefore, under V3's Level 2 CLOS architecture, the Leaf layer access switch (hereinafter referred to as AS) must be connected to all the Spine layer core switches (hereinafter referred to as DS), that is, 2 DSs; if the design is 4 DSs, then the AS must be connected to each DS four times, and the complexity rises sharply. Therefore, the overall capacity of DCN depends on the total number of ports of DS equipment. The more slots of DS equipment and the higher the port density of a single slot, the larger the capacity of a DCN accessible server.

Figure: Level 3 CLOS

The V4 was replaced with a new 3-level CLOS design. Each aggregation switch (CS) at the Leaf layer needs to be connected to all DSs at the Spine layer. For example, a typical CS is a 32-port 100G device, 16 ports connected to DS, 16 ports connected to AS:

2 DS are designed, 1 CS has 8 ports connected to DS1 and 8 ports connected to DS2, totally 16 ports are connected, each DS consumes 8 ports;

If 4 DS are designed, the 16 uplink ports of 1 CS are divided into 4 groups, and each group of 4 ports is respectively connected to DS1/2/3/4, and each DS consumes 4 ports;

If there are 8 DS, then 1 CS only needs to consume 2 ports of DS…

It can be seen that the more Spine layer devices are designed, the fewer DS ports are required for each CS, and the more CSs can be accessed. Under the condition that other conditions remain unchanged, the larger the access capacity of the whole DCN.

Through the architecture change from Level 2 CLOS to Level 3 CLOS, the access capacity of the entire DCN can be improved. Theoretically, with the development of hardware technology, the design capacity can be increased to infinity. This solves the problem of DCN capacity. According to our current design, a single DC capacity can provide a maximum of 80,000 server access ports, and a single available area can reach 320,000, which is four times that of the DCN V3 era, and can meet the smooth expansion needs of all regions of UCloud in the next few years.

3. Introduction of POD

After Level 2 CLOS becomes Level 3 CLOS, there is an additional aggregation layer. We call a group of aggregation switches and their connected access switches, as well as the racks with access switches, as a POD. A single POD provides consistent network capabilities, including:

Consistent connectivity. In a POD, all AS to CS connections are the same, such as 1100G single-wire interconnection or 2100G; all server to AS connections are also consistent, such as 125G connection to AS or 225G connection to AS.

Consistent network characteristics. The network characteristics supported by a POD are the same, such as ECMP support, QoS support, and direct access to the public network.

This allows us to tailor POD offerings based on the network performance and characteristics required by the business.

For example, the current service zones include public cloud zone, physical cloud zone, managed cloud zone, gateway zone, management zone, IPv6 zone, etc., among which the requirements of public cloud zone, gateway zone, management zone and IPv6 zone on the basic network are basically the same. Under the new POD design idea, they are all merged into "intranet POD." For services with extremely high network IO such as large data areas and cloud storage areas,"high-performance intranet POD" is set, which has 2* 25G full-line speed access network capability for each server, providing QoS and lossless network characteristics. In addition, there are "Integrated POD" to address server access that requires public network/other special network requirements, and "Hybrid Cloud POD" to provide bare metal or user private cloud access to meet different business needs to solve flexibility problems.

In general, POD is designed according to the network capacity to meet the needs of different services, and can avoid cost waste, control CAPEX, and avoid excessive network partition by service partition, control maintenance complexity.

4. DC Group

UCloud public cloud resource pool is divided into two levels: "region"(generally a geographical city) and "availability zone"(referred to as AZ, two availability zones are generally more than 10km away, infrastructure isolation).

An AZ can contain multiple DC, but in fact, since DC is connected to POP and interworks with other DC under V3 architecture, it needs to pull optical cable and set up wavelength division, which brings bandwidth bottleneck and delay increase. Therefore, even if the distance between two DCs is very close, it is not suitable as an AZ resource pool, and it is contrary to the distance requirement of AZ as two AZ.

Figure: Comparison before and after DC Group was generated

The V4 framework proposes the concept of "DC Group", which connects geographically close DC full-mesh to provide services to the outside world as a single AZ. Benefits include:

Low network latency. The distance between DC in DC Group is very close, usually no more than 10km, and the delay caused by this is less than 0.1 ms;

Increase redundancy and bandwidth. Due to the short distance between DC and the low cost of optical cable, we can add more optical cable connections, on the one hand to ensure sufficient redundancy, on the other hand to increase sufficient bandwidth;

Rollable upgrade. By creating a new generation DC, it can meet the requirements for new services to be launched in the original AZ, and basically has no impact on the DC in operation.

For example, some time ago we released a high-performance SSD cloud drive product. During the service deployment phase, there are not many idle cabinets in Beijing II Availability Zone D. If we wait for the application to deploy to a new cabinet, we will waste valuable time. However, if you only deploy products in the newly opened Availability Zone, you will not be able to take care of the needs of users in the original Availability Zone.

This contradiction can be well resolved by adding a new DC under the DC Group architecture.

summary

In UCloud's overall network design, the goals of the underlying network are "stability" and "efficiency." By organizing physical circuits, classical network equipment and network technology, the basic network forms a stable and high-performance network bottom layer, providing IP connectivity for upper services. The basic network is used to support the infrastructure of the computer room and the business. It is necessary to solve the eternal contradiction between "the rapid change of business demand" and "the difficulty of upgrading the basic network." The DCN data center network is one of the most important components of the infrastructure network.

Figure: UCloud overall network design

Our redesigned DCN V4 architecture over the past year enables new DC to be fully upgraded to 25G/100G, supports lossless network features, improves DC capacity and performance between DC, and provides horizontal expansion and rolling upgrade capabilities for AZ resources. All in all, it balances the contradiction between "new demands" and "old structure" and can meet the development needs for several years. In the future, infrastructure networks will continue to keep pace with technological developments, providing a more stable and efficient underlying network for public cloud products.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report