In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >
Share
Shulou(Shulou.com)06/01 Report--
This paper first analyzes the problems encountered in the networking of large-scale SDN data centers. On the one hand, the underlying networking scale of Underlay is limited by the actual forwarding capacity and port density of the equipment, and the Fabric architecture of a single Spine-leaf can not meet the needs of large-scale networking; on the other hand, in the implementation of SDN technology, Openstack and SDN controllers have restrictions on management and control capabilities respectively.
This paper goes deep into the technical details from two aspects: the Underlay networking and routing planning of multi-POD large-scale data center, and the implementation scheme of cross-POD interworking SDN technology. Combined with the implementation of network traffic model, this paper expounds the networking architecture of large-scale SDN data center.
1. Analysis of problems to be solved in the networking of large-scale SDN data Center
Large-scale SDN data center network needs to implement tens of thousands of servers as a resource pool to host and schedule. Considering the implementation of Underlay networking and SDN solution, there are three main problems that need to be solved.
(1) at the Underlay networking level of the data center. Although with the continuous upgrading of chips, the processing and forwarding capacity of data center switches has been greatly improved, but based on the current data center switch port capabilities, and taking into account the actual number of cabinets in each computer room, as well as the difficulty and ease of wiring across computer rooms, a single Spine-leaf two-tier network can not meet the carrying needs of tens of thousands of servers.
For example, in a data center network, the mature 16-slot core switch equipment of the mainstream manufacturers in the industry is selected as Spine,100G board port density is 20 / board, 40G board port density is 30 / board, and 48 10 trillion 6 40G access switches are selected as Leaf. Leaf to Spine are fully interconnected, the number of Spine cores is fully equipped with 6, and the core switches are each equipped with 2 100G cards for connecting external firewalls, private networks or dedicated line routing devices. In the case of satisfying the bandwidth convergence ratio at 1:1, it is calculated that a single Spine-Leaf architecture can support up to 5760 servers, which can not meet the carrying needs of tens of thousands of servers.
(2) the management scale and scope of the SDN controller. SDN controller manages VSW or hardware switch enables TCP persistent connection, from occupying CPU memory resources, too many managed devices will greatly consume the resources of SDN controller, and then reduce the performance of SDN controller, which is the main limiting factor of SDN controller management scale. The management range of the SDN controller is mainly limited by the network delay between the controller and the managed device, so the SDN controller recommends local deployment rather than long-distance remote management. At present, mainstream equipment manufacturers can manage 2000 VSW or 1000 hardware SDN switches in the case of SDN controller 3-machine cluster.
(3) the management capability of the cloud operating system Openstack. Openstack is a centralized message processing mechanism, all interactive operations will be split to the instruction level, but the instruction concurrent processing ability is low, mainly in the form of a single process queue. For example, in the scenario where 100 virtual machines are operated at the same time in the resource pool, when instructions are split for interactive operations, a large number of instructions have to be queued for execution because of the poor concurrent processing ability of instructions. The efficiency and timeliness of the interactive operation response of the Openstack system will deteriorate at this time, affecting the actual perception of users.
Cell technology can greatly improve the message processing efficiency of the Openstack platform. Nova can be extended to multiple Nova processing nodes, each node has an independent database, using the way of database synchronization to achieve the cooperation and distributed work of multiple nova nodes. However, the performance of Openstack system is closely related to the actual R & D capability of enterprises. at present, the mainstream manufacturer products developed based on open source Openstack manage 500 virtual Host (5000 VM) or 3000 bare metal servers.
two。 Multi-POD networking Architecture of large-scale SDN data Center
Due to the Underaly network access carrying capacity of a single Spine-Leaf structure, the management capacity of the Openstack platform, and the control range and control scale of the SDN controller, it is necessary to decompose into several separate Spine-Leaf modules for deployment in large-scale SDN data center networking. The modules cooperate with each other through the unified application layer with the help of SDN-DCI technology to realize the unified management and arrangement of the whole data center resource pool. Each separate Spine-leaf module is a separate Fabric, also known as a POD (Point of Delivery).
The standard SDN data center architecture is used for networking in POD, with a separate Openstack cloud operating system and SDN controller for each POD. According to the performance index of Openstack cloud operating system from mainstream manufacturers, the number of servers supported in bare metal server scenario in POD is limited to 3000, and the number of server Host hosts in virtualized server scenario is limited to 500. At the same time, according to the performance of the SDN controller of the mainstream manufacturers, the number of hardware switches in the POD is less than 1000 and the number of VSW is less than 2000.
Multi-POD large-scale SDN data center networking, Underlay networking in POD is the standard Spine-Leaf architecture. SDN-GW in POD can be configured in conjunction with Spine or deployed alongside Spine, while firewalls and load balancer devices can be deployed alongside SDN-GW.
Currently, SDN-GW is mainly deployed on two sets of stacks to facilitate the unified management of SDN controllers. Therefore, if the scale of POD is large and more than two Spine are needed, it is not recommended to combine SDN-GW and Spine, and SDN-GW should be deployed separately.
In order to realize the intercommunication of traffic between POD, the east-west traffic aggregation core switch Core-Spine is set up to carry the east-west traffic across the POD; in order to realize the exchange of visits from inside the POD to the external network, the north-south traffic aggregation core switch Out-Spine is set up to carry the north-south traffic. The number of east-west traffic aggregation core switches and north-south convergence core switches can be flexibly calculated according to the actual POD size, the number of POD, and the network convergence ratio requirements. The converging core switch from Spine to POD in POD is generally interconnected across computer rooms. In order to improve link utilization, 100G optical module interconnection should be adopted.
If the east-west traffic planning between POD is very large, it is recommended that the Spine in the POD be directly connected to the east-west aggregation switch. The traffic model at this time is that the interworking traffic between the POD goes from the Spine in the POD to the SDN-GW,SDN-GW to unpack the original VXLAN, and then the interworking traffic is imported into different interconnected VNI and sent back to the Spine, and finally sent by the Spine to the east-west convergence switch. Under this traffic model, the same business traffic will pass through the Spine in the POD twice, so if the traffic planning is completely within the bearing range of the SDN-GW switch equipment, it is recommended that the SDN-GW be connected to the east-west aggregation switch, which can reduce the traversing traffic on the Spine in the POD.
Figure 1. Multi-POD networking Architecture of large-scale SDN data Center
The implementation scheme of SDN data center forwarding control technology in POD can be Openflow+Netconf or E × × + Netconf. Virtual airport view is recommended to use VSW with larger and more flexible items as VTEP, thus adopting Openflow+Netconf scheme. In the bare metal server scenario, the hardware SDN access switch is used as the VTEP. You can flexibly choose the solution of E × × × + Netconf or Openflow+Netconf according to the capacity of specific network devices.
In the scenario of mixed deployment of Openflow+Netconf and E × × × + Netconf, you need to translate and break through the two control technical solutions on the SDN controller. The SDN controller and the SDN-GW establish an E × × neighbor, translate the information of the E × × × control plane into Openflow and send it to the VSW, translate the relevant Openflow information of the VSW into E × × × control information and send it to the hardware SDN switch. So as to control and realize the establishment of VXLAN tunnel and data forwarding between VSW and hardware SDN switch.
The scheme of interconnection between POD will fully learn from the relevant technology of SDN-DCI and adopt the technology of E × × × + VXLAN. At the same time, the SDN-GW in the POD will be used as a DCI-GW to establish E × × neighbors with the SDN-GW of different POD, and the interconnection of cross-POD traffic will be realized under the control of a unified collaboration layer.
Shared distributed block storage, distributed file storage and distributed object storage can be planned to form storage POD separately. The traffic accessing the storage POD reaches the storage POD via Underaly network routing and forwarding after SDN-GW unwraps the VXLAN encapsulation. Configure a separate VRF within the POD to isolate the traffic accessing the storage and other business traffic. If the storage POD needs to access the external network, the storage aggregation switch is planned to connect the north-south aggregation switch. FC SAN storage is recommended to be deployed directly in each POD.
Figure 2. Storage POD network roadmap
3. Underlay networking and routing Planning of large-scale SDN data Center
In the Underlay networking of large-scale data centers with multi-POD, there are a large number of network devices in the network. According to the number of 500 network devices per POD, 10 POD networking network devices will exceed 5000. Therefore, how to plan the routing configuration at the Underlay level is very important for the high-performance forwarding of large-scale data center networks.
General data center scenario IGP routing is mainly based on OSPF routing, OSPF routing technology is mature, network construction, operation and maintenance personnel have rich experience. Using OSPF as the IGP routing protocol for large-scale data center networking, each POD should be divided into different Area areas, and the east-west convergence switch should be used as the backbone area Area0, in order to reduce the propagation area and number of LSA. The SDN-GW in each POD, as the boundary network equipment in the OSPF area, divides the different interfaces into different areas, the interface of the east-west convergence switch connected to the Area0, and the Spine interface in the lower POD into each POD separate Area. The north-south convergence switch generally works in the second-layer transparent transmission mode, and the third layer ends in the external network firewall, so the north-south convergence switch does not run the routing protocol.
Figure 3. OSPF routing Planning for large-scale data Center networking
Compared with OSPF,ISIS supporting ISPF (Incremental SPF), it has better support ability and convergence performance for large-scale networks. ISIS supports flexible TLV coding, and the protocol is more scalable. Because of its fast convergence speed, clear structure and suitable for large-scale networks, ISIS has been widely used in metropolitan area network scenarios or IP private network scenarios as IGP routing protocols. With the increasing size of the data center and the increasing number of devices, ISIS is more and more used in the data center scenario. The area boundary of ISIS is on the link, and each network device can only belong to one ISIS area. In order to reduce the propagation area and number of LSP, the ISIS is planned hierarchically in a large-scale data center scenario, and the backbone area includes the east-west convergence switch between POD and the SDN-GW within each POD. The east-west convergence switch between POD runs SDN-GW within ISIS level2,POD and runs level-1-2 of ISIS. Spine and Leaf run ISIS level1 within each POD.
Figure 4. ISIS routing Planning for large-scale data Center networking
RFC7938 proposes to apply the EBGP routing protocol to large-scale data centers, and there are a few examples of using EBGP in the data center as the underlying routing protocol. Different from link-state protocols such as OSPF and ISIS, BGP is a distance vector routing protocol, so BGP has better scalability. When building networks in small and medium-sized data centers, there is little difference in performance between using BGP and link-state protocols such as ISIS and OSPF, but in super-large data centers, the performance of applying BGP will be better. Link-state protocols such as OSPF and ISIS need to transmit a large amount of LSA in the network. The routing information generation process is to synchronize the LSA information first, and then calculate and generate the routing information. When some nodes of the network change or the network splicing upgrade, it will cause a large number of LSA transmission. However, the distance vector routing protocol BGP does not have such a problem. Because routes are advertised directly between BGP nodes, the network stability is better when the network is expanded and spliced.
At present, the LSA optimization of OSPF and ISIS routing protocols has corresponding draft in IETF, the purpose is to reduce the propagation number and range of LSA, and has made the performance of OSPF and ISIS in super-large-scale data center network better, but there is no very effective and practical scheme. Although EBGP is not widely used in data centers, the distance vector routing protocol BGP is likely to be more widely used as the underlying routing protocol for very large-scale data centers in the future.
The planning and configuration of EBGP routing is more complex than that of OSPF and ISIS. Multiple Spine devices in the POD are planned for the same AS number, multiple east-west converging switches are planned for the same AS number, and each group of stacked Leaf plans for a separate AS number. Although the Leaf within each POD only establishes EBGP neighbors with the Spine within the POD, and no EBGP neighbors are established between the Leaf, a large amount of Leaf neighbor information still needs to be configured on the Spine. Complex planning and configuration is one of the factors that limit the application of EBGP in the data center.
In large-scale data centers that use EBGP as the underlying routing protocol, if E × × × + Netconf is used as the forwarding control scheme in POD, E × × × in POD needs to be established on the basis of IBGP, so a network device is required to configure EBGP+IBGP with two BGP processes with different AS numbers at the same time. At present, there are mainstream manufacturers' network devices that support BGP dual processes with different AS numbers.
The regular BGP message AS number is 16 bits long, with values ranging from 0 to 65535, and the private AS number ranges from 64512 to 65534. Therefore, the number of private AS numbers that can be used for networking planning in the data center is 1023. According to the principle of one AS number for each group of stacked Leaf, it is obvious that it can not meet the AS number allocation requirements of multi-POD large-scale data center network. RFC6793 suggests that the AS number of BGP should be extended to 32-bit length, and there is no problem that the expanded AS number can meet the requirement of large-scale data center networking. At present, the mainstream equipment in the industry already has the capability of supporting 32-bit AS number.
Figure 5. EBGP routing Planning for large-scale data Center networking
The management network of SDN data center not only meets the traditional out-of-band equipment management functions, but also deploys Openstack cloud management platform and SDN controller, so it is more important than the traditional data center management network. With the increase of the size of the data center, the size of the management network is bound to increase at the same time, so the management network of large-scale data centers also needs to be deployed in different POD. The core switch of the POD management network is equipped with each network segment gateway, and the management network access switch works in the layer 2 VLAN transparent transmission mode. A management aggregation switch is set up between the management network POD, and the core of the management network within the POD and the aggregation switch between the POD are interconnected in three layers, which can run ISIS or OSPF routing protocols. In order to reduce the management network broadcast domain in the POD and make the management network more stable, we can also configure the management network segment gateway on the management access switch and plan the management network from three layers to the edge, but the disadvantage of doing so at the same time is that more detailed management address planning is needed, too subdivided management address planning will waste address resources to a certain extent, so the management network planning from three layers to the edge is not common.
4. Interconnection between POD of large-scale SDN data center
Large-scale SDN data centers need to uniformly manage and schedule resources in different POD to construct a unified resource pool for large-scale data centers. Large-scale SDN data center uses SDN-DCI technology to realize the interconnection between POD.
The SDN-DCI technology establishes the cross-POD interconnection path through E × × × + VXLAN, the management plane adopts E × × × protocol, and the data side is carried by VXLAN tunnel. The SDN-GW within the POD will act as the DCI-GW at the same time, and the SDN-GW of each POD will be configured to run Full mesh's EBGP protocol. Based on the EBGP protocol, an E × × neighbor relationship is established between the SDN-GW of each POD, and the control plane of interconnection is established through E × × ×, which transmits the routing information of the MAC, ARP and IP network segments in the tenant VPC (Virtual Private Cloud).
Large-scale data centers deploy a unified cloud management platform, and coordinate the orchestration of SDN controllers in each POD to achieve the interconnection of cross-POD network traffic. Considering that the devices of different manufacturers are likely to be used in the actual network deployment, the cloud management platform needs to interface with SDN controllers of different manufacturers, so it is necessary to define an open interface between the standard SDN controller and the northbound API of the cloud management platform. According to this standard interface, the SDN controller of different manufacturers receives the instructions of the cloud management platform and controls the forwarding devices in this POD to complete the execution of the instructions.
Figure 6. Schematic diagram of E × × × + VXLAN technical scheme for cross-POD interworking
By analyzing the interconnection requirements of large-scale data centers across POD services, the following traffic models can be obtained: cross-POD interconnection with the same business domain, but private network firewall; cross-POD interconnection between different tenants in the same business domain and private network firewall; cross-POD interconnection between different business domains and tenants through private network firewall; cross-POD interconnection between different business domains and different tenants through private network firewall.
The summary and analysis of the above network traffic models can be summarized and simplified into two kinds of interworking traffic models, namely, cross-POD firewall interworking and cross-POD firewall interworking. In the cloud management platform cross-POD interworking service interface instruction template, add the firewall status enable switch to determine whether to pass through the firewall. In addition, considering the symmetry of the traffic model, both sides of the POD are required to pass through the wall in the scenario of crossing the wall.
Cross-POD interconnection cannot pass firewall traffic. Tenant traffic is encapsulated into a local VXLAN tunnel when the local access VTEP is accessed. When it reaches the SDN-GW in the POD, the local VXLAN is unencapsulated and re-encapsulated into the interconnected VXLAN, and then sent to the SDN-GW in the peer POD. When the traffic reaches the SDN-GW in the peer POD, the interconnection VXLAN is unencapsulated and encapsulated into the corresponding tenant's local VXLAN tunnel. Cross-POD interconnection traffic of different services should be isolated. A separate VNI and VRF should be planned for each group of business interconnection traffic, and VNI and VRF should be bound.
Figure 7. Cross-POD but firewall traffic model
Through the firewall traffic model across the POD, the tenant traffic arrives at the SDN-GW in the POD and unwraps the local VXLAN and then forwards it to the firewall through VLAN layer 2. After the firewall is processed, it is sent back to the SDN-GW,SDN-GW and re-encapsulated into the interconnected VXLAN, and then sent to the SDN-GW in the peer POD. After the traffic reaches the SDN-GW in the peer POD, unlock the interconnection VXLAN encapsulation, forward it to the firewall in this POD through VLAN layer 2, and send it back to SDN-GW,SDN-GW after the firewall processing, and then encapsulate the traffic into the corresponding tenant's local VXLAN tunnel.
Figure 8. Cross-POD but firewall traffic model
Cross-POD interconnection traffic of different services should be isolated. A separate VNI and VRF should be planned for each group of business interconnection traffic, and VNI and VRF should be bound. For some of the business traffic that needs to be handled by load balancer equipment, the cloud management platform can uniformly arrange the traffic to go through the corresponding load balancing.
5. Brief introduction of North-South Traffic of large-scale SDN data Center
For the processing of north-south traffic in large-scale SDN data centers, after the introduction of multi-POD networking, north-south converging switches are added. The north-south converging switch is connected to the Internet firewall, IP private network and dedicated line router respectively. The north-south convergence switch works in the two-layer transparent transmission mode in the processing of the north-south service traffic of the Internet, and the third layer ends in the SDN-GW and the external network firewall respectively. In the processing of north-south traffic to and from IP private network and direct connect line, you can work in layer 2 transparent transmission mode or layer 3 mode depending on the specific situation. Working in layer 3 mode, you need to configure VRF to isolate different business traffic.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.