How does Tungsten Fabric support large-scale cloud platforms | TF 07/06 Update SLTechnology News&Howtos

How does Tungsten Fabric support large-scale cloud platforms | TF

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >

Shulou(Shulou.com)06/01 Report--

Click to download the document to view all relevant materials in this article. Https://163.53.94.133/assets/uploads/files/large-scale-cloud-yy.pdf

Today's sharing is more technical. First, let's look at the nature of SDN, and then analyze why it is better than OVS and why it can support a larger scenario from the perspective of Tungsten Fabric (hereinafter referred to as TF) architecture.

First, let's take a look at the network requirements of the cloud. The first is tenant isolation. IaaS is multi-tenant, and the requirement of address reuse can also be realized in the traditional way of VLAN. In addition, the traditional VXLAN protocol or OVS protocol only provides layer 2 isolation, but not layer 3 isolation. As long as your machine is tied to the public network IP or to the public routing level, the three layers can communicate with each other, so there is also a need for layer 3 isolation at the tenant isolation level.

Second, the cloud needs the network to support the migration of virtual machines across cabinets. VXLAN also has to cross the second floor of the data center, which is not impossible, but in addition to network requirements, there are storage requirements, which is more difficult. What is the most difficult part of the migration of virtual machines across cabinets? The traditional network architecture is access-convergence-core. Below the router, there is a layer-2 architecture, and machines can be migrated on different racks. However, when the cloud in a data center is large enough, the layer-2 basic network cannot support the whole cloud. Different racks are in different layers, so the virtual machine migration requires that the IP address cannot be changed.

In addition, there are requirements for network functions and services. There are shared resource pools on the cloud. If you take load balancer as an example, virtualize a powerful hardware load balancer to multiple tenants, or switch to a small load balancer virtual machine instance for different tenants? This paper puts forward the requirements of network virtualization and network function virtualization to support the operation of IaaS.

Network virtualization, which mainly applies the technology of Overlay, is one of the technologies used by SDN. More standard technologies are used, including VXLAN, GRE, STT, GENEVE, and MPLSoverUDP and MPLSoverGRE used by TF. The main way is to encapsulate the frame of layer 2 on vTEP and transmit it to the peer through VXLAN tunnel.

The advantage of this is that the number of tenants is increased, the traditional VLAN is 4096, if VXLAN is 2 to the fourth power; secondly, the underlying transmission is based on the IP network, which is much more scalable than the layer 2 network, so that the network can be expanded and virtual machines are migrated on different physical network segments. But it also brings some work, how to get the devices on both sides to establish communication? The first is to establish a VXLAN tunnel, which is what the SDN controller does, and the second is to exchange MAC address information on both sides.

There are two VM, on two different hosts, it is through Overlay to solve the layer 2 communication problem. For example, if ESX1 wants to communicate with ESX2, when ping first sends an ARP request, what is the corresponding MAC address? For VXLAN, there needs to be a forwarding table. MAC2 needs to make a packet through the IP of VTEP2, then transmit it to VTEP2, unencapsulate it, and then transfer it to the corresponding machine. How to set up this forwarding table? That's what SDN needs to do.

One is self-learning. As long as VM1 sends an ARP request and arrives at the vTEP gateway, it will flood the request and broadcast it to the vTEP node with which it has established a tunnel. If you see who gives feedback to ARP reply, you will know the IP address of the peer, and you can set up a forwarding table. But there will be a lot of consumption, ARP address also has the process of aging, once aging, you need to re-learn, the intuitive experience is that the ping package will take a long time. If it is a large-scale network, frequent flooding will bring great challenges to the reliability of the network.

In most cases, SDN is combined with the cloud management platform to know which virtual machines are available on this virtualized server and forward the MAC address table or forwarding table to the vTEP device in advance. This is what the SDN control plane does.

In the Open vSwitch solution, through the database and RabbitMQ, to issue an OVSDB command, set up the corresponding flow table, let the virtual machine know where to go. TF also has a corresponding mechanism, which will be described later.

The function of the data plane is to forward the frame of the second layer according to the forwarding table, and to packet and unpack it. I would like to mention MTU here, this is the value of the second layer, the transfer unit of the frame, why should it be set so large? First of all, the VXLAN protocol will add some Hdr of Hdr,UTP 's Hdr,VXLAN. If the packet is not processed, it will exceed 1500, and the network card will not send this frame. In the early days of OpenStack, we often encounter the MTU problem of trouble shooting. Later, the solution is to set a parameter through the Agenter of DHCP. If the network card MTU is 1500, it defaults to 14XX, which is automatically reduced, so that the data frame of the virtual machine can pass through the layer 2 network card. So how much is the appropriate value for MTU? Now the best practice is to set it to 9000. In the vTEP part, packets and forwarding are used. According to the actual test, if the MTU value is increased, the throughput will be greatly improved.

I just talked about what SDN does to control the forwarding table and transmission, and then let's talk about the classification of SDN. According to the genre, SDN can be divided into software and hardware, and the difference is whether the vTEP is on the vRouter or on the hardware forwarding of the switch, depending on where the unencapsulated packet is. Hardware is generally a proprietary protocol, such as Cisco using OPFlex. The software also has many different projects, of which TF is the most productive one.

According to the classification of controllers, there are centralized and distributed. Centralized controllers are OpenFlow, OVSDB, and TF uses XMPP (a chat protocol). We can understand the Open vSwitch of Neutron as the OVSDB protocol. Neutron sends the signaling to the specific computing node through RabbitMQ. The OVS Agent of the computing node adds the corresponding flow table to the OVSDB switch through the command of OVSDB.

Distributed controllers, such as EVP N-VXLAN, use MP-BGP.

Which form of controller do you think is better? What are the current projects using OpenDaylight? Huawei, Huasan and others are in use, and the SDN architecture of its controller is done with reference to OpenFlow. Some manufacturers have their own R & D capabilities, based on their own hardware equipment, can develop a relatively perfect product. But in the open source community, it is rare to see a successful OpenDaylight project, only provides a framework and some components, can not be quickly based on the open source project run. In fact, OpenFlow is just a concept. Judging from the development of SDN in recent years, it has not become a de facto standard.

On the contrary, OVSDB is used in the control of Neutron software and the control of switch. For example, in the early BMS implementation of TF, the virtual machine communicates with bare metal, which goes through VLAN TAG, goes to the TOR switch, and then reaches the virtual network through the conversion from VLAN to VXLAN. The conversion from VLAN to VXLAN is sent through the OVSDB protocol. Your switch, in theory, can do BMS scenarios as long as it supports the OVSDB protocol.

I prefer this kind of distributed controller myself. Because centralized will always have bottlenecks, whether software bottlenecks or performance bottlenecks. The core protocol of EVP N-VXLAN is how scalable is MP-BGP,BGP? Look at the Internet backbone network now, what is running is the BGP protocol.

MP-BGP protocol, through the controller to send flow table information, and then through the BGP protocol to interact, when using as long as your corresponding architecture, to do the corresponding BGP protocol scalability design.

For TF, it is actually the integration of centralized and distributed schools, which not only uses the centralized architecture, but also uses the distributed controller technology in the external connection.

To summarize the SDN requirements of large-scale cloud platforms, the first is that the network infrastructure should be scalable. If the two-tier architecture is adopted, the bottleneck is in the infrastructure, which is limited by the number of ports of the switch, and the layer 2 switch adopts spanning tree protocol. For large-scale network platforms, if the level of operation and maintenance is poor, it is very dangerous to pick up a loop.

Second, the controller is scalable. Whether centralized or distributed, it must be architecturally scalable, and how much it can support is a matter of code implementation.

Third, there is no centralized single point in the data plane. When it comes to the practical application of SDN, operation and maintenance is a big challenge. Whether you are a developer or a person with an operation and maintenance background, the most important thing is to understand what the data flow is like between virtual machines and between virtual machines and external networks. What command should be used to look at these traffic and go to trouble shooting, just like how the traditional network operation and maintenance staff used to grab packets? For expansibility, in order to achieve the traditional network SNAT, floating IP, etc., each project has a different way of implementation. If there is no single point in the implementation process, it can be expanded in architecture.

Fourth, the extended network across clusters. No matter how much the architecture expands, there is always a limit, and for a single cluster, it will always reach a bottleneck. If you want to build a larger cluster, you can scale multiple clusters horizontally to form a large resource pool. Then the question is whether the network needs to be interconnected. If you deploy a highly available business, how to interwork across clusters? some mainstream high-availability components need to interwork at layer 2 across two clusters, and SDN is required to support such requirements.

In terms of infrastructure, if you use TF, as long as it is used in a production environment, just choose IP CLOS architecture, and there is no need to mess with two-tier architectures such as Fabric. IP CLOS can bring sufficient scalability and high performance, including no vendor locking, basically no need to move after construction.

Leaf is a top-of-the-rack switch, and the subnet below is the subnet. Different racks use different subnets, and the upper layer communicates through layer 3 gateway routing. The most important thing is how the three layers of Leaf Spine interact. Juniper Network has a document introducing IP CLOS's white paper, which compares OSPF, ISIS and BGP on the control plane. The best suggestion is eBGP.

To interact.

The essence of TF is SDN based on MPLS V P N. The original V P N is to solve the interconnection between multiple site, and the control plane is BGP protocol. In the data center, the interconnection between clouds, a physical machine can be regarded as a site, different physical machines with the help of VP N to establish different tunnels to get through, this is the essence of TF. The difference lies in the control plane. TF uses a centralized controller and controls the routing table of vRouter through the XMPP protocol.

Take a brief look at the functional architecture of TF. In terms of multi-cloud support, including VMware, OpenStack container and BMS; network functions, layer 2 network, layer 3 network, DHCP, DNS, QoS, firewall, LB and so on are supported.

At the time of deployment, the controller is mainly divided into two kinds of nodes, one is the Analytics Cluster analysis node, which mainly does flow visualization. Another Controller node is used to control the network, which is divided into Configuration and Control Node. The former provides API and interfaces with cloud management platforms such as Neutron. API creates a pod, records the database on the Configuration, and converts it into the corresponding IF-MAP. The controller creates the corresponding interface on the vRouter through the XMPP command, and then transmits the interface information to different vRouter or external gateways and hardware devices. The controller core protocol is BGP protocol.

The controller is a router reflector in the whole TF. All the layer 2 interface, MAC interface information and layer 3 routing information are stored here and distributed to different vRouter. For vRouter in the cloud, use XMPP to push, and for Gateway outside the cloud, push through BGP or NETCONF.

Among them, the NETCONF protocol is not used to push routing information, but is mainly used to send out some configurations with network hardware devices. For example, if a BMS server is added in TF to connect to the virtual network managed by TF, the Device Manager in TF will configure and distribute the VLAN interface to the TOR switch through NETCONF, and establish the interface through NETCONF. Then the TF controller configures the corresponding VLAN-VXLAN bridging gateway through OVSDB protocol or EVP N-VXLAN protocol. If you need to extend the virtual network to Gateway,NETCONF, it will also help create the corresponding Routing instance configuration. The exchange of information at the routing level is still realized through BGP.

The databases used by TF are only available in the last 8 years. For example, Cassandra, Zoo Keeper and RabbitMQ of distributed databases are highly available and scalable in architecture. How much can be expanded depends on the code implementation. Neutron, on the other hand, is essentially a database that records all the information and sends out all the flow tables.

The data plane of TF is mainly forwarded through vRouter, and the process of vRouter agent in user namespace arranges for the controller to get some information based on XMPP connection, and then send it to the forwarding plane of kernel. The functions of layer 2 isolation and layer 3 isolation are provided here. If the virtual machine of the same network is connected under the same vRouter, the communication between the two sides will be completed here. Network virtual machines of different tenants, connected with different VRF and Routing instance, can not communicate.

VRouter has many built-in network features, such as DNS. The DNS of TF will be resolved according to the DNS configuration of the host. If you encounter any problems, you can check whether there is any problem with the DNS of the host. DHCP response is also here, Neutron is different, there is a special DHCP agent running on a node, OVS in large-scale cases, if connected to more than 1500 interfaces, packet loss will basically occur, and often problems.

Security Group is implemented in OVS through the iptables associated with linux bridge, while in vRouter it is implemented through the built-in ACL function. Network Policy is a distributed firewall. Floating IP also made a NAT in vRouter to go out.

Let's talk a little bit more about Link-Local. What is its scene? As a cloud service provider, it is necessary to provide customers with NTP services, ATP, YUM sources, and other public services. How can a virtual machine access a public service within a virtual network? One way is to open up the routing of these networks, because you need a VTEP exit to import public routes through cloud gateway, which brings a problem. All ATP traffic and download traffic will go through cloud gateway, and the traffic will follow. TF provides a Link-Local mode, which is an address of 169.254, which is only one hop in the network standard. In OpenStack virtual machines or AWS virtual machines, metadata services are provided through 169.254. The local machine does not have this route, do a layer of NAT to the address on the gateway, and the local NAT will access the configured Link-Local mapping, and then access the internal service.

On the premise of no enhancement, the measured performance of Neutron's OVS VXLAN (only optimized by MTU) can reach a maximum of two gigabits. On the other hand, vRouter can achieve a performance of 70% or 8000 without any optimization. Of course, you can also use DPDK, Smart NIC to do optimization, or make use of the transparent transmission function of SR-IOV.

Let's take a look at TF's Packet interaction. No matter it is the same network segment or different network segment, the interaction between virtual machines is forwarded at the vRouter level and will not go through a centralized gateway. So at the data interaction level between virtual machines, there is no single point.

Another important scenario is SNAT. In OpenStack, if the virtual machine connects to a vRouter, you can access the network segment of the external through the SNAT function of vRouter. In fact, TF itself does not provide SNAT, but it also implements the SNAT function. It forwards a SNAT through NS router (an IP tables running in the computing node). If a virtual machine wants to access a network other than this gateway, first go to gateway, forward it, and then connect to external's network through vRouter. Here, the creation of NS router requires the scheduler of OpenStack nova.

For each network, there will be a router to forward, if the volume is too large, the bottleneck may be here in NS router, but will not affect other networks.

As long as it is not in the cloud, it can be called a public network. There are two ways to get out of the public network: the first is Floating IP. Make a NAT in vRouter and publish the IP after NAT to a VRF of Cloud Gateway through the tunnel of MPLS over GRE, and communicate with the public network.

In the second external interconnection scenario, if you want to provide a cloud service, you can do different flouting IP for different operators. If you do the direct connect service of L3 VP N or L2 VP N, you can access different MPLS networks through cloud gateway, and then route the virtual network to the corresponding VRF, and the whole network will be connected. This is also where TF is powerful. MPLS VP N is naturally interconnected with the traditional network.

There are two main modes of cross-cluster interconnection, or multi-cloud interconnection.

The first is based on the interconnection between controllers, which transmits the information of VP N, network and interface through the establishment of EBGP connection between controllers. This scheme is called Federation.

The vRouter on both sides can be reached only by three layers. For example, the B1 of one side accesses the B3 of the other side, the two sides are different controllers and VP NMagine MPLS VP N has a route target, one side export and one side import, the routing table sees the route of the other side, and the routing information exchanges, thus establishing a layer 2 or layer 3 connection. The scheme of Federation is implemented at the controller level, which is more suitable for the same region, the same data center, and has a relatively close connection.

The second mode is to establish EBGP neighbors between different VRF of the network through the interconnection of cloud gateway, and manually configure different RT of import or export to achieve cross-cloud connection. It should be noted that the two sides are different clusters, IP address management is not the same, when assigning addresses, it is necessary to avoid IP address overlap.

Finally, if you match the bid with Neutron OVS, TF can be said to be a complete winner.

In terms of basic network, TF can be expanded. Currently, Neutron OVS can only use layer 2 network. No matter centralized or distributed, floating IP sinks to the computing node layer. There is no mature BGP solution in the current components, which can publish floating IP to the border gateway. OVS DVR and the border gateway can only be connected through layer 2.

Compared with the architecture, TF is scalable, and the performance of 3 nodes is not enough, which can be extended to 5 nodes. Although Neutron OVS is a high availability architecture, high availability of database through MySQL cluster and high availability of API through K8s, computing logic is not distributed and relies heavily on RabbitMQ. If you use DVR mode, four agent will be deployed on each compute node, bringing more topic, which is a great challenge to the performance of RabbitMQ. As soon as there is a RabbitMQ outage or network jitter, the cluster recovery mechanism will be implemented immediately, resulting in the death of RabbitMQ very quickly.

In addition, on the forwarding plane, the performance of Open vSwitch is not as good as vRouter; TF has more network features, while native LBaaS components such as Neutron's Octavia also need to mature; in terms of multi-cloud interconnection, TF is based on MPLS VP N; and network equipment interaction, Neutron only ironic's networking generic switc driver,TF supports BGP, NETCONF, EVP N-VXLAN, etc., standards-based protocols cover devices from Juniper Networks, Cisco, Huawei, Ruijie and other manufacturers.

That's all for today. Thank you!

MORE more TF Meetup lectures

The realistic Dilemma of Multi-Cloud Interconnection and the Road of Open Source SDN (record of the first TF Meetup speech)

Tungsten Fabric: the Golden key to CMP (TF Meetup speech record)

Follow the Wechat official account: TF Chinese Community

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.