In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/02 Report--
SRIOV introduction, VF pass-through configuration, and packet forwarding rate performance test
Brother slow's original article, welcome to reprint.
Catalogue
▪ 1. SRIOV introduction
▪ 2. Environment description
▪ 3. Turn on SRIOV
▪ 4. Generate VF
▪ 5. VF through
▪ 6. Turn on irqbalance
▪ 7. VM Migration
▪ 8. Bandwidth speed limit
▪ 9. Safety
▪ 10. Other use restrictions
▪ 11. Performance testing
▪ 12. Windows virtual machine using VF
▪ 13. Operation and maintenance command
▪ 14. Host shielded VF driver
Attached to ▪. Test method for packet forwarding rate
Attached to ▪. Reference documentation
1. SRIOV introduction
The bottleneck of the traditional way of ▷: the network card of qemu, the traditional way is to use the tap network card to bridge to the host bridge, but the performance is very poor, especially the packet forwarding rate is very low, so it is difficult to meet the scenarios with high performance requirements. The main reason for poor performance is that the path is too long and there are too many kernel devices passing through. The fundamental reason is that the linux/unix kernel itself is not designed for high performance, and linux/unix is more suitable to be a control plane than a forwarding plane.
▷ solution: reduce the intermediate path, the most simple and effective way is the bypass kernel. The role of SRIOV is to host the kernel of bypass.
▷ PF and VF: each physical network card (such as p1p1) is a PF. After enabling SRIOV, each PF can generate a fixed number of VF, and each VF can be used directly as a network card on the host, or directly into the QEMU virtual machine as a network card in the virtual machine. This implements the bypass host kernel.
First, the conclusion of the performance test is given. Compared with the traditional tap+bridge scheme, the performance of SRIOV VF cut-through is improved:
▷ forwarding rate increased: 677%
▷ packet receiving and forwarding rate increased: 171%
two。 Environment description
Model: Dell PowerEdge R620
Network card: Intel X520 (82599ES)
Host OS:CentOS 7
VM OS:CentOS 7
3. Turn on SRIOV
1 ️opening SRIOV in BIOS, as shown in the figure
Note: even if global SRIOV is enabled in BIOS, the network card can still be used as an ordinary network card.
2 ️needs to enable VT-d in BIOS
3 ️configuration grub iommu
Iommu=pt intel_iommu=on4. Generate VF# boot network card ip link set p1p1 up# view pci number of pf lshw-c network-businfo# check the number of vf supported by the network card cat / sys/bus/pci/devices/0000:41:00.0/sriov_totalvfs# generate vf, it is recommended to join boot echo 63 > / sys/bus/pci/devices/0000:41:00.0/sriov_numvfs
Note: if the host VF driver is not blocked, it will take a while to see all named NICs on the host after generating the vf (otherwise you will see a pile of ethX NICs). The more vf, the longer the waiting time. 63 vf, about 10 seconds.
5. VF through
If qemu is managed through libvirt, there are three ways to configure it:
▷ method 1 (interface): add to the devices paragraph
The address of the address above can be configured according to "lshw-c network-businfo", such as
Pci@0000:41:10.0 p1p1_0
▷ method 2 (hostdev): add to the devices paragraph
The address of address above is also configured according to "lshw-c network-businfo".
▷ method 3 (net-pool)
Define a net-pool for each PF Nic, that is, edit a xml file separately. Only one PF is shown here, edit sriov-int.xml
Sriov-int
Join libvirt net-pool, activate, and set boot up
Virsh net-define sriov-int.xmlvirsh net-start sriov-intvirsh net-autostart sriov-int
Although net-autostart is configured, it does not work, because when the physical machine starts, it often starts the vf before it starts to generate the libvirt (suppose generating vf in rc.local), and this net-pool (sriov-int) should not be started until the vf is generated, so it is recommended to add the following to the rc.local to ensure startup.
Ip link set p1p2 upecho 63 > / sys/bus/pci/devices/0000:41:00.0/sriov_numvfsvirsh net-start sriov-int
Then, add to vm's xml
How to choose three methods
▷ method 1: multi-functional, mac and vlan can be configured
The ▷ methods 2:mac and vlan need to be set by typing the ip command on the host.
▷ method 3: there are 2 problems
There is a bug in ▪. When the total number of VF of a certain PF used by all vm in the host exceeds the VF limit, no error will be reported and can be started, but there may be anomalies. If the vm is shut down by destroy, then the corresponding VF will have a problem. For example, when using ip link set p1p1 vf 0 mac 0000 PF 00 to reset, it will prompt "RTNETLINK answers: Cannot allocate memory", and it will be difficult to repair, even if repaired. I don't know if there are any invisible anomalies.
▪ has no way to know which vf is used by a vm, so if you want to set a speed limit on vf or switch spoofchk, you can only get the vf number on the host by "ip link show dev p1p1 | grep MAC address", and then set the speed limit and other operations.
To sum up: method 3 is the most convenient to use, but there is a bug, so you need to do some logic to prevent the total number of vf used by vm from exceeding the upper limit.
6. Turn on irqbalance
X520 is a 2 queue and x710 is a 4 queue. You need to start the interrupt balancing service (irqbalance) in vm, otherwise there will be only one cpu to process the packets.
In addition, this has nothing to do with the query_rss of vf on the host.
7. VM Migration
Pass-through Nic belongs to PCI devices, while libvirt and qemu do not support vm migration with non-USB PCI devices, including cold migration and hot migration. Therefore, hot migration cannot be realized.
Cold migration, there are two options:
▷ detach the vf network card, then use libvirt for migration, and then attach vf the network card on the new host after the migration.
▷ undefine vm, then re-render and define vm on the new host
Note: the migration feature of libvirt cannot be used when vm shuts down, which sometimes causes the virtual machine to disappear, including the original host and the new host.
8. Bandwidth speed limit
Can only limit outbound bandwidth, not inbound bandwidth
Ip link set p1p1 vf 0 max_tx_rate 100
Indicates the speed limit of outbound bandwidth 100Mbps, which varies from network card to network card:
▷ x520 network card minimum speed limit 11Mbps, maximum speed limit 10000Mbps, set to 0 means no speed limit. An error will be reported if it is less than 11 or greater than 10000.
▷ x710 network card minimum speed limit 50Mbps, maximum speed limit 10000Mbps, set to 0 means no speed limit. If it is less than 50, it will be set to 50 automatically, and if it is greater than 10000, an error will be reported.
Note: the bandwidth speed limit of vf will not be reset after vm shutdown.
9. Safety
Only source mac filtering and network card mac tamper protection are supported, but no other security protection is supported (anti-arp spoofing cannot be achieved)
Source mac filtering
Ip link set p1p1 vf 0 spoofchk on
Indicates the packet sent out in the vm. If the source mac is not the specified mac, the packet is not allowed to pass. Note: the spoofchk of vf will not be reset after vm shutdown
Tamper-proof network card mac
When ▷ modifies the mac in mac,vm on the host, it will not change; if you modify mac in vm, you can see the change on the host.
If ▷ changes the mac address when vm is turned off, it will be changed to vm's mac when vm is turned on, and back to the original mac when vm is turned off again.
▷ can change the mac address in vm only if the mac of the current vf seen on the host is all 0, even if the spoofchk of vf is off. However, there is one exception. If you use method 2 above to configure xml, although the mac of vf seen on the host is not 0, it can be modified in vm.
▷ when mac is set on the host, the mac in the virtual machine cannot be tampered with
▪ method 1 (interface) is used to configure xml. It is estimated that when vm starts, it automatically helps to set mac on the host, so it directly implements the tamper-proof function.
▪ method 2 (hostdev) to configure xml, you need to manually set the mac address on the host again to achieve tamper-proof
Manually modify the mac method on the host (you can change it when vm is turned off or turned on):
Ip link set p1p1 vf 0 mac aa:bb:cc:dd:ee:ff
Recommendations:
▷ resets the vf before vm starts
▷ resets vf once after vm undefine
10. Other use restrictions
The direct connection of ▷ to the vf network card in vm cannot be bridged to the linux bridge in vm, which also makes ebtables unavailable, and iptables can be used.
The vf Nic connected directly from ▷ to vm can join ovs bridging.
▷ A vm can only support a maximum of 32 vf. If the number exceeds, an error will be reported.
11. Performance testing
Test method:
▷: multiple vm send packets at the same time, and one vm receive packets. Observe the performance of sending packets and receiving packets respectively.
▷ sends the packet vm on the same host, and receives the vm on another host.
▷ testing tool: modprobe pktgen
▷ test package size: udp package, size is 64 bytes
Configuration:
▷ vm configurations are all 4-core 8G
All ▷ physical NICs are x520 (vf queue defaults to 2)
Both ▷ host and vm enable irqbalance and turn off numad
▷ does not configure cpu binding, does not configure numa binding
▷ opens a large page
Test results:
Test conclusion:
The use of SR-IOV+VF cut-through mode can significantly improve the packet forwarding rate. 1-to-1 test results show that Kernel packets can reach 3.5Mpps and receive packets can reach 1.9Mpps.
▷ package is 1196% higher than vxlan and 677% higher than vlan. This result refers to 1-to-1 (1 sending vm,1 and 1 receiving vm)
The package acceptance of ▷ is 363% higher than that of vxlan and 171% higher than that of vlan. This result refers to 3-to-1 (3 sending vm,1 and receiving vm)
Description:
The processing capacity of ▷ kernel single core packet (64B) is 2Mpps
▷ 2Mpps is because 2Mpps is the bottleneck of Kernel state. If you use dpdk, it can be greater than 2m. The reason: the receiving end needs to balance packet interrupts to different cpu. Method: you can assign each queue to a separate cpu through multiple queues (irqbalance will automatically balance), and then if the Kernel is different, it will correspond to different queues, that is, different interrupts. That is, if there is one VF,2 queue and the VM has at least 2 cores, then when the load balancing conditions are met (mac and ip are different), the maximum 4Mpps can be reached in theory.
More test results:
The packet size used in the following tests is 64B
▷ kernel mode, layer 3 forwarding performance: the sender uses different source ip
▪ BCM57800:2Mpps
▪ Intel X520:10Mpps
▪ Intel X710:12Mpps
▷ kernel mode, layer 2 forwarding performance: the sender uses different source mac
▪ BCM57800:2Mpps
▪ Intel X520:7.3Mpps
▪ Intel X710:7.8Mpps
Vxlan encapsulation capability in ▷ kernel state
▪ vxlan inner layer uses different source ip to send packets
▪ package is: 1. 1-1.2Mpps
▷ dpdk user mode, layer 2 forwarding performance: the sender uses different source ip
▪ BCM57800: not supported
▪ Intel X520:14.8Mpps
▪ Intel X710:14.8Mpps
▷ SR-IOV mode
Total 11.2Mpps of ▪ X520, and each vm is the total number of 11.2Mpps/vm (i.e. number of VF)
Summary:
The basis factors of interrupting balance in ▷ kernel state: layer 2 according to source mac,3 layer according to source ip
Single-core forwarding capability limit 2Mpps using traditional interrupt mode in ▷ kernel mode
Note:
In ▷ kernel mode, using multi-queue RSS interrupt balance to improve throughput will result in very high cpu.
In ▷ user mode, even if source mac or source ip is fixed, the throughput is basically close to the speed limit 14.8Mpps.
▷ vxlan cannot use multiple cores to improve throughput, mainly because there is not enough source ip in the outer layer.
12. The windows virtual machine uses VF
Download the corresponding driver from the official website of the network card and install it. After testing, win2012 has 82599 (x520) drivers by default, but the version is old.
13. Operation and maintenance command # check the number of vf supported by the network card cat / sys/bus/pci/devices/0000:41:00.0/sriov_totalvfs# host shields the VF driver, check the corresponding https://github.com/intel/SDN-NFV-Hands-on-Samples/blob/master/SR-IOV_Network_Virtual_Functions_in_KVM/listvfs_by_pf.sh of vf and pf, and then execute. / listvfs_by_pf.sh can be # sleep Check which VF is being used after blocking VF-- status# to check which socketlstopo-no-graphics# lspci corresponds to the network card to view the network card information lspci-Dvmm | View specific VF traffic on the grep-B1-A 4 Ethernet# host (only x520 is supported X710 not found) ethtool-S p1p1 | grep VF14. Host shields VF driver echo "blacklist ixgbevf" > > / etc/modprobe.d/blacklist.conf
Indicates that when the physical machine starts, the ixgbevf driver is not loaded by default, but if you manually modprobe ixgbevf, the driver will also be loaded.
If ixgbevf is currently loaded and you want to uninstall it, you need the following steps
Echo 0 > / sys/bus/pci/devices/0000:41:00.0/sriov_numvfsrmmod ixgbevfecho 63 > / sys/bus/pci/devices/0000:41:00.0/sriov_numvfs attached. Test method for packet forwarding rate
Modprobe pktgen: send packets through pktgen, receive packets through sar-n DEV, and send udp packets.
#! / bin/bashNIC= "eth2" DST_IP= "192.168.1.2" DST_MAC= "52 modprobe pktgenpg () {echo inject > $PGDEV cat $PGDEV} pgset () {local result echo $1 > $PGDEV result= `cat $PGDEV | fgrep" Result: OK: "`if [" $result "="] Then cat $PGDEV | fgrep Result: fi} # Config Start Here-# thread config# Each CPU has own thread. Two CPU exammple. We add ens7 Eth3 respectivly.PGDEV=/proc/net/pktgen/kpktgend_0echo "Removing all devices" pgset "rem_device_all" echo "Adding ${NIC}" pgset "add_device ${NIC}" # device config# delay 0 means maximum speed.CLONE_SKB= "clone_skb 1000000" # NIC adds 4 bytes CRCPKT_SIZE= "pkt_size 64" # COUNT 0 means foreverCOUNT= "count 0" delay 0 "PGDEV=/proc/net/pktgen/$ {NIC} echo" Configuring $PGDEV "pgset" $COUNT "pgset" $CLONE_SKB pgset "$PKT_SIZE" pgset "$DELAY" pgset "dst ${DST_IP}" pgset "dst_mac ${DST_MAC}" # Time to runPGDEV=/proc/net/pktgen/pgctrlecho "Running... Ctrl ^ C to stop "pgset" start "echo" Done "# Result can be vieved in / proc/net/pktgen/eth [3mem4]
▷ changes the eth2 at the beginning of the script to the network card corresponding to sending the package.
▷ changed 192.168.1.2 at the beginning of the script to the target ip
▷ changed the 52DU 5400VL 43R 99Rd at the beginning of the script to the target mac.
Pktgen-dpdk
# fixed ip fixed macset 0 dst ip 192.168.10.240set 0 src ip 192.168.10.245/24set 0 dst mac c8:1f:66:d7:58:baset 0 src mac a0:36:9f:ec:4a:28# variable source macstop 0range 0 src ip 192.168.0.1 192.168.0.1 192.168.200.200 0.0.0.1range 0 dst ip 10.1.241 10.1.241 10.1 . 1.241 0.0.0.0range 0 dst mac c8:1f:66:d7:58:ba c8:1f:66:d7:58:ba c8:1f:66:d7:58:ba 00:00:00:00:00:00range 0 src mac a0:36:9f:ec:4a:28 a0:36:9f:ec:4a:28 a0:36:9f:ec:ff:ff 00:00:00:00:01:01range 0 src port 100 100 65530 1range 0 dst port 100 100 65530 1range 0 Size 64 64 0enable 0 rangeenable 0 latencystart packages set 0 rate 50 at a rate of 50% attached. Refer to document # openstack about restrictions on sriov https://docs.openstack.org/mitaka/networking-guide/config-sriov.html# Migration https://wenku.baidu.com/view/d949db67998fcc22bcd10dfd.htmlhttps://www.chenyudong.com/archives/live-migrate-with-pci-pass-through-fail-with-libvirt-and-qemu.html# sriov configuration https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/virtualization_ Host_configuration_and_guest_installation_guide/sect-virtualization_host_configuration_and_guest_installation_guide-sr_iov-how_sr_iov_libvirt_works# line-rate http://netoptimizer.blogspot.tw/2014/05/the-calculations-10gbits-wirespeed.html
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.