How to understand the usage scenario of vxlan in openstack 07/13 Update SLTechnology News&Howtos

How to understand the usage scenario of vxlan in openstack

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article introduces you how to understand the use of vxlan in openstack scenarios, the content is very detailed, interested friends can refer to, hope to be helpful to you.

I. Preface

Before the introduction, let's first talk about the concepts of underlay and overlay in the network. Underlay refers to the physical network layer, and overlay refers to the logical network or virtual network above the physical network layer. Overlay is built on the basis of underlay, which requires pairwise interconnection of devices in the physical network. The emergence of overlay breaks through the physical limitations of underlay and makes the network architecture more flexible. Take vlan as an example, in the underlay environment, devices in different networks need to be connected to different switches. If you want to change the network to which the device belongs, you need to adjust the connection of the device. After the introduction of vlan, it is only necessary to add the device to the target vlan to adjust the network to which the device belongs, which avoids the connection adjustment of the device.

Second, the pain points of vlan in cloud environment

Insufficient number of vlan id

Vlan header is composed of 12bit, the theoretical limit is 4096, and the number of available vlan is 14094, which can not meet the needs of cloud environment.

Vm heat transfer

In the cloud computing scenario, the traditional server becomes a vm running on the host. Vm runs in the memory of the host, so it can be migrated from host A to host B without interruption, as long as the ip and mac addresses of vm remain unchanged before and after migration, which requires that vm is in a layer 2 network. After all, in a three-tier environment, different vlan use different ip segments, otherwise the router will be in trouble.

Mac table entries are limited

The mac entries of ordinary switches include 4k or 8k, which will not become a bottleneck in small-scale scenarios. If multiple vm is running on each physical server in a cloud computing environment, multiple vnic,mac addresses per vm may grow exponentially, and the table entry limitation of the switch has become a problem that must be faced.

Third, the solution to the pain point vxlan.

Win by more

Vxlan header is composed of 24bit, so theoretically the number of VNI is 16777216, which solves the problem of insufficient number of vid.

It should be noted here that in openstack, although the number of vni on br-tun increases, the network type on br-int can only be vlan. All vm have a process of internal and external vid (vni) conversion, which converts the vni of the user layer into the vid of the same layer.

If you are careful, you may wonder: although the number of vni on br-tun is 16777216, but there are only 4096 vid on br-int, does it make sense to introduce vxlan? The answer is yes, given the current computing power of physical machines, it is impossible to run 4094 vm on different tenant,1 physical machines, so this mapping makes sense.

The above figure is a schematic diagram of vm communication between 2 computing nodes. All the vm in the figure belong to the same tenant. Although the vni of the same tenant is the same at the user layer, the vid assigned by the same tenant by the nova-compute can be inconsistent in this layer. The mutual access of the vm between the same subnet of the same tenant on the same host does not need to go through internal and external vid (vni) conversion, while the mutual access of the vm of the same tenant on different hosts needs to be converted by vid (vni). If the correspondence between vid and vni on all hosts is the same, there can only be a maximum of 4094 tenant in the entire cloud environment. It is really meaningless to introduce vxlan.

Do one thing under cover of another

As mentioned earlier, hot migration of vm requires that ip and mac addresses cannot be changed before and after migration, so vm is required to be in a layer 2 network. Vxlan is a kind of overlay technology, which encapsulates the original message again and transmits it using udp, so it is also called mac in udp. On the surface, it transmits encapsulated ip and mac, but actually transmits ip and mac before encapsulation.

disappear from the scene

In the cloud environment, the table item size of the access switch will become a bottleneck, and there are only two ways to solve this problem:

1. Expanded list items: more advanced switches have larger table items and use advanced switches to replace the original access switches, which will increase costs.

two。 Hide mac addresses: the same effect can be achieved with vxlan without increasing the cost. As we learned earlier, vxlan encapsulates the original message again, and the vetp role that implements the vxlan function can be located on the switch or the host where the vm is located. If the vtep role is on the host, the access switch will only learn the mac address of the vtep after re-encapsulation, not the mac address of the vm on it.

If the vtep role is located on the access switch, it is more efficient to process messages, but the access switch learns the mac address of vm, and the restrictions on table entries have not been resolved. These two situations will be explained in detail later.

These are the reasons why vxlan is used in openstack scenarios. The implementation principle of vxlan will be described in detail below.

Fourth, the implementation mechanism of vxlan

What does the vxlan message look like?

Vxlan message is encapsulated again on the basis of the original message, which has achieved the purpose of three-layer transmission and two-layer transmission.

As shown in the figure above, the original encapsulated message becomes the data part of vxlan, the vxlan header is the vni,ip layer header as the source and destination vtep address, and the link layer header is the mac address of the source vtep and the next device mac address to the destination vtep.

In the public cloud architecture that the author is engaged in, the vtep role is realized through the ovs on the host, and the interface type connected to the access switch on the host is trunk, and a network plane is specially planned for vtep in the physical network.

When passing through vtep, vm removes vid and adds vni through flow table rules.

The vid planned by vtep is printed in the packaging process of vxlan. As shown in the following figure, vlan tag can be set in the mac header of vxlan.

What is vtep?

The full name vxlan tunnel endpoint,vxlan of vtep can be understood abstractly as tunneling through the three-layer network, and the two ends of the starting point and the end are vetp. Vtep is an important model for realizing vxlan function, which can be deployed on access switches or servers. Deploy in different locations except whether to learn the mac address of vm mentioned earlier, the implementation mechanism is also all different. If there is no special description below, the default vtep is deployed on the access switch, and the deployment of vtep on the server will be described separately.

Establishment of vxlan Tunnel

For the physical switch, vtep is a role on the physical switch, in other words, vtep is only a part of the function of the switch, not all messages need to go through the vxlan tunnel, messages may also be forwarded through ordinary layer 2 and layer 3. So which messages need to go through the vxlan tunnel?

As shown in the figure above, vxlan creates a second-tier concept that requires the establishment of an vxlan tunnel when an vm connecting two different vtep needs to communicate. Each large second-level domain is called a bridge-domain, referred to as bd, which is similar to the vid of vlan. Different bd is expressed by vni, and the relationship between bd and vni is 1:1.

The configuration for creating bd and setting the correspondence between bd and vni is as follows:

# bridge-domain 10 / / create a bd vxlan vni 5000 / / number 10 / / set the vni corresponding to bd10 to 5000 #

Vtep generates a mapping table between bd and vni based on the above configuration, which can be viewed from the command line, as shown below:

With the mapping table, the message entering the vtep can determine which vni should be added when the message is encapsulated according to the bd to which it belongs. The question is what the message is based on to determine which bd it belongs to.

It can be realized through layer 2 subinterface access to vxlan tunnel and vlan access to vxlan tunnel. The layer 2 subinterface mainly does two things: one is to check which packets need to enter the vxlan tunnel according to the configuration; the other is to judge what to do with the packets passed by the check.

As shown in the figure above, based on the layer 2 physical interface 10GE, on 1-0-1, layer 2 subinterfaces 10GE 1Universe 1.1 and 10GE 1Universe 1.2 were created, and their flow encapsulation types were configured as dot1q and untag, respectively. The configuration is as follows:

# interface 10GE1/0/1.1 mode L2 / / create layer 2 subinterface 10GE1/0/1.1 encapsulation dot1q vid 10 / / only VLAN Tag 10 messages are allowed to enter VXLAN tunnel bridge-domain 10 / / messages enter BD 10 # interface 10GE1/0/1.2 mode L2 / / create layer 2 subinterface 10GE1/0/1.2 encapsulation untag / / only messages without VLAN Tag are allowed to enter VXLAN tunnel The message entering the bridge-domain 20 / / is BD 20 #

Based on the layer 2 physical interface 10GE, on 1-0-2, a layer 2 subinterface 10GE 1AGOUA 2.1 was created, and the flow encapsulation type was default. The configuration is as follows:

# interface 10GE1/0/2.1 mode L2 / / create layer 2 subinterface 10GE1/0/2.1 encapsulation default / / allow all messages to enter VXLAN tunnel bridge-domain 30 / / messages enter BD 30 #

Now that all the conditions are in place, you can automatically establish a vxlan tunnel through the protocol, or manually specify the source and destination ip addresses of the vxlan tunnel to establish a static vxlan tunnel between the local vtep and the peer vtep. For Huawei CE series switches, the above configuration is done under the nve (network virtualization Edge) interface. The configuration process is as follows:

# interface Nve1 / / create logical interface NVE 1 source 1.1.1.1 / / configure the IP address of the source VTEP (the IP address of the Loopback interface is recommended) vni 5000 head-end peer-list 2.2.2.2 vni 5000 head-end peer-list 2.2.2.3 #

Among them, there are two peer vtep of vni 5000, and the ip addresses are 2.2.2.2 and 2.2.2.3, respectively. At this point, the vxlan tunnel is established.

The configuration of layer 2 subinterfaces at both ends of the VXLAN tunnel is not necessarily identical. Because of this, it is possible for two VM belonging to the same network segment but different VLAN to communicate through the VXLAN tunnel.

To sum up, vxlan currently supports three encapsulation types, as shown in the following table:

When there are many vni, this method needs to create a subinterface for each vni, which becomes very troublesome.

At this point, vlan should be used to access the vxlan tunnel. For vlan to access the vxlan tunnel, you only need to allow the messages carrying these vlan to pass under the physical interface, and then bind the vlan to bd to establish the bd information corresponding to bd and vni, and finally create a vxlan tunnel.

The configuration of vlan binding with bd is as follows:

# bridge-domain 10 / / create a bd L2 binding vlan10 with number 10 / bind bd10 to vlan10 vxlan vni 5000 / / set the vni corresponding to bd10 to 5000 #

Vxlan communication flow on the same subnet

As shown in the figure above, assuming that vtep is implemented through a subinterface on the access switch, VM_A communicates with VM_C for the first time. Because there is no MAC address of VM_C on VM_A, an ARP broadcast message is sent to request the MAC address of VM_C. The forwarding process of ARP request message and ARP reply message is used to illustrate how the MAC address is learned.

The forwarding process of ARP request message is as follows:

1. VM_A sends an ARP broadcast message in which the source MAC is MAC_A, the destination MAC is full F, the source IP is IP_A and the destination IP is IP_C, and request the MAC address of VM_C.

2. After receiving this BUM (Broadcast&Unknown-unicast&Multicast) request, VTEP_1 will copy the message according to the headend replication list and encapsulate it separately. According to the configuration on the layer 2 subinterface, it is determined that the message needs to enter the VXLAN tunnel. After determining the BD to which the message belongs, the VNI to which the message belongs is determined. At the same time, VTEP_1 learns the corresponding relationship between MAC_A, VNI and message entry interface (Port_1, the physical interface corresponding to layer 2 subinterface) and records it in the local MAC table.

3. After the message arrives at VTEP_2 and VTEP_3, VTEP unencapsulates the message and gets the original message sent by VM_A. At the same time, VTEP_2 and VTEP_3 learn the correspondence between the MAC address of the VM_A, the VNI and the IP address (IP_1) of the remote VTEP, and record them in the local MAC table. After that, VTEP_2 and VTEP_3 process the message according to the configuration on the layer 2 subinterface and broadcast it in the corresponding layer 2 domain.

After receiving the ARP request, VM_B and VM_C compare whether the destination IP address in the message is the native IP address. VM_B discovers that the destination IP is not a native IP, so it discards the message; if VM_C discovers that the destination IP is a native IP, it responds to the ARP request.

The ARP reply message forwarding process is shown in the following figure:

4. Because the MAC address of VM_A has been learned on the VM_C, the ARP reply message is a unicast message, and the unicast message is no longer copied at the head end. The message source MAC is MAC_C, the destination MAC is MAC_A, the source IP is IP_C and the destination IP is IP_A.

5. After VTEP_3 receives the ARP reply message sent by VM_C, it identifies the VNI to which the message belongs (the identification process is similar to step 2). At the same time, VTEP_3 learns the correspondence between MAC_C, VNI and message entry interface (Port_3) and records them in the local MAC table. After that, VTEP_3 encapsulates the message. The outer source IP address encapsulated here is the IP address of the local VTEP (VTEP_3), the outer destination IP address is the IP address of the peer VTEP (VTEP_1), the outer source MAC address is the MAC address of the local VTEP, and the outer destination MAC address is the MAC address of the next hop device in the network destined for the destination IP. According to the outer MAC and IP information, the encapsulated message is transmitted in the IP network until it reaches the peer VTEP.

6. After the message arrives at VTEP_1, VTEP_1 unencapsulates the message to get the original message sent by VM_C. At the same time, VTEP_1 learns the correspondence between the MAC address of the VM_C, the VNI and the IP address (IP_3) of the remote VTEP, and records it in the local MAC table. After that, VTEP_1 sends the unencapsulated message to VM_A.

At this point, both VM_A and VM_C have learned each other's MAC address. After that, VM_A and VM_C will communicate in unicast mode.

Vxlan communication flow in different subnets

As shown in the figure above, VM_A and VM_B belong to the 10.1.10.0 and 10.1.20.0, respectively, and belong to VNI 5000 and VNI 6000, respectively. The layer 3 gateways corresponding to VM_A and VM_B are the IP addresses of BDIF 10 and BDIF20 on VTEP_3, respectively. (the function of the BDIF interface is similar to the VLANIF interface. It is a layer 3 logical interface created based on BD, which is used to realize the communication between different subnet VM or between VXLAN network and non-VXLAN network. ). There are routes on the VTEP_3 to the 10.1.10.0 Universe 24 segment and to the 10.1.20.0 apperance 24 segment. At this point, VM_A wants to communicate with VM_B.

Since it is the first time to communicate, and the VM_A and VM_B are in different network segments, the VM_A needs to send an ARP broadcast message to request the MAC of the gateway (BDIF 10). After obtaining the MAC of the gateway, the VM_A first sends the data message to the gateway; then the gateway will also send the ARP broadcast message to request the MAC of the VM_B, and after obtaining the MAC of the VM_B, the gateway sends the data message to the VM_B. The above process of MAC address learning is the same as that of MAC address learning in the interworking of the same subnet, so I will not repeat it. Now assume that both VM_A and VM_B have learned the MAC of the gateway, and the gateway has also learned the MAC of VM_A and VM_B. The message forwarding process of VM interconnection in different subnets is shown below:

1. VM_A first sends the data message to the gateway. The source MAC of the message is MAC_A, the destination MAC is the MAC_10 of the gateway BDIF10, the source IP address is IP_A, and the destination IP is IP_B.

2. After receiving the data message, VTEP_1 identifies the VNI (VNI 5000) to which the message belongs, and encapsulates the message according to the MAC entry. The outer source IP address encapsulated here is the IP address (IP_1) of the local VTEP, the outer destination IP address is the IP address (IP_3) of the peer VTEP, the outer source MAC address is the MAC address (MAC_1) of the local VTEP, and the outer destination MAC address is the MAC address of the next hop device in the network destined for the destination IP.

3. The message goes into VTEP_3,VTEP_3 to decapsulate the message and get the original message sent by VM_A. Then, VTEP_3 will do the following to the message:

(1) VTEP_3 finds that the destination MAC of the message is the MAC of the native BDIF 10 interface, and the destination IP address is IP_B (10.1.20.1), so it will find the next hop of IP_B according to the routing table.

(2) it is found that the next hop is 10.1.20.10 and the exit interface is BDIF 20. At this time, VTEP_3 queries the ARP table entry, and modifies the source MAC of the original message to MAC (MAC_20) of BDIF 20 interface, and the destination MAC to MAC (MAC_B) of VM_B.

(3) when the message is sent to the BDIF20 interface, it is recognized that it needs to enter the VXLAN tunnel (VNI 6000), so the message is encapsulated according to the MAC table. The outer source IP address encapsulated here is the IP address (IP_3) of the local VTEP, the outer destination IP address is the IP address (IP_2) of the peer VTEP, the outer source MAC address is the MAC address (MAC_3) of the local VTEP, and the outer destination MAC address is the MAC address of the next hop device in the network destined for the destination IP.

4. After the message arrives at VTEP_2, VTEP_2 unencapsulates the message, gets the inner data message, and sends it to VM_B. The process in which VM_B responds to VM_A is similar to the above process and will not be repeated.

It should be noted that the interworking between VXLAN networks and non-VXLAN networks also requires the help of three-layer gateways. The difference of its implementation is that the message will be encapsulated on the VXLAN network side, while it does not need to be encapsulated on the non-VXLAN network side. After the message is entered into the gateway from the VXLAN side and unencapsulated, it is forwarded according to the ordinary unicast message transmission mode.

5. The vtep role is deployed in ovs

How ovs creates vxlan tunnels

As we learned earlier, vtep will still learn the mac address of vm when it is deployed on the access switch, which does not solve the problem of table entry restrictions. This is why the vtep role is deployed in the ovs of the host in the public cloud scenario.

Instead of manually establishing a vxlan tunnel on an access switch, the neutron-server responsible for the network in openstack will build its own tunnel after it is started. Here's how neutron-server automatically establishes a tunnel.

As shown in the figure above, the ovs on each host is created by ovs-aget. When Compute Node 1 is connected to the network, he will first report his network type to neutron-server and after local_ip,neutron-server receives these resource information (network, port and subnet in neutron are all called resources), he will find other computing nodes of the same network type and create a tunnel between them. Synchronize this message to ovs-agent on other compute nodes at the same time.

Whenever there is a change in neutron resources, or when ovs does not know where to deal with the traffic, it will be like neutron-server reporting or waiting for its notification, coupled with the previous flow table, does it feel familiar? Yes, neutron-server is a sdn controller in addition to accepting api requests.

The difference between vtep and access switch

1. The vtep access switch implemented with ovs will only learn the mac address encapsulated by vtep, but not the mac address of vm, which solves the problem of mack address table entry.

two。 The physical switch establishes different tunnels through the binding of bd and vni. When ovs is implemented, there can be multiple vsi (virtual switch instance) in each vtep, and each vsi pair uses one vni.

This is how vxlan is implemented in openstack through physical devices or ovs.

On how to understand the use of vxlan in openstack scenarios to share here, I hope that the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.