Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Experience sharing of NSX Virtual Network Fault Analysis

2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/02 Report--

Today's topic is about the virtual network fault analysis of NSX and the experience sharing of problem troubleshooting and localization. strictly speaking, it does not belong to the category of end-user computing, but end-user computing and software-defined networks have become more and more inseparable. more and more users are using NSX to build a proprietary network environment for EUC products, such as allocating proprietary network space to VDI's computing resource pool. See the previous blog that uses NSX to build proprietary subnets.

Recently, the author has also built a set of EUC experimental environment based on NSX virtual network. By using the ability of logical network provided by NSX, you can build your own network at will, interconnection, network differential segments, distributed firewall, and you don't have to bother the company's network administrator at all. Since it is your own territory to make your own decisions, of course, if something goes wrong, you have to take care of it on your own, and you can't bother the network management. Here I would like to share with you a network failure that I encountered recently. The process of troubleshooting is quite interesting. I hope to provide you with some ideas to solve virtual network problems.

First of all, the network architecture of my experimental environment is similar to the following figure

Figure 1

The experimental environment consists of five servers and three clusters, each of which places EUC-related product components.

Because it is an experimental environment, there are two clusters of managementcluster, and Network Cluster contains only one server. Of course, in a production environment, a cluster must contain at least two servers to ensure high availability.

Figure 2

So let's talk about the problems I encountered. One afternoon I was still working in my own experimental environment. For example, I could access the external network 192.168.99.0Comp24 normally from the vm1 located on the intranet 192.168.100.0Charger 24. In the evening, I found that all the virtual machines located on the intranet 192.168.100.0and24 could no longer access the extranet.

When things happen all of a sudden, there will be evil. The first reaction is that the route on the north-south network channel may be damaged, because other colleagues in the environment are doing other experiments, first asking other colleagues to stop operating in the environment and eliminate the interference of other factors. Then I combed through the settings on Distributed Logical Router and Edge Gateway and found nothing unusual.

Without any clue, I just rebuilt a similar network environment on the same hardware environment according to http://www.virtualizationblog.com/nsx-step-by-step-part-16-configuring-static-route/. In this new network environment, virtual machines still cannot access external network resources.

Using ping,tracert and other tools, it is found that every virtual machine in the intranet can access the intranet gateway 192.168.100.1 and the downlink port 10.10.10.2 on the transition network, but the uplink port 10.10.10.1 on the transition network cannot. This phenomenon still makes me think that there is something wrong with the north-south route. I try to locate where the route is broken, but I still don't have any clue.

After wasting most of the day, I tried to take a look at the east-west network communication again. I found that some of the virtual machines on the same intranet 192.168.100.0 Universe 24 can communicate with each other and some cannot communicate with each other, which makes me suspect that there may be something wrong with the virtual network built by NSX, such as the IP used by VXLAN Tunnel End Point is occupied by others. I began to read the official problem solving manual https://pubs.vmware.com/NSX-62/topic/com.vmware.ICbase/PDF/nsx_62_troubleshooting.pdf again. It was too big to read it completely and failed to follow the steps in it to locate the problem. It is useful to think about this document in hindsight, according to the method of troubleshooting each subsystem separately, from the bottom up, you should be able to find the cause of the failure.

Looking back, I began to look at the communication from east to west, trying to find out some rules from the phenomenon that some virtual machines can communicate with each other and some virtual machines can't communicate with each other. As a result, we really found a rule: the virtual machines on the intranet 192.168.100.0 in Management Cluster and Workload Cluster can communicate with each other, but neither of them can communicate with the virtual machines on the intranet 192.168.100.0 in Network Cluster. As shown in figure 1, vm1,vm3,vm4,vm5 can communicate with each other, but not with vm2. Because all the network node components in the north-south direction are also located on the physical server where the vm2 is located, it seems that all the virtual machines on the ESXi server 192.168.99.12 have become isolated islands of the network. From this phenomenon, it is reasonable to suspect that there is something wrong with the network interface on the machine.

In my experimental environment, each server has four network card interfaces, the first of which is used as the vmkernel interface of ESXi. This network card must not be broken, otherwise I can't access vm2 through vCenter at all.

Figure 3

The virtual network of NSX is based on the distributed network switch of vSphere, and the distributed network switch can assign a different physical network card as the uplink interface to each physical host. The virtual network 192.168.100.0Comp24 uses the second physical network interface, NIC2, as the uplink interface on the physical host where the Vm2 is located.

Figure 4.

After reasonable doubt, we need to find out the facts. Discuss a reverse verification method with Luke: configure ESXi on the physical host where the vm2 is located to manage the physical network interface of the network. The default configuration is NIC1, change the network interface to NIC2,NIC3,NIC4 in turn, and then observe the connection of the ESXi host in vCenter. If the physical host loses its connection in vCenter, it indicates that there is something wrong with the physical network port.

Figure 5

Some verification work has been done, and it has been proved that there is something wrong with the three NIC2,NIC3,NIC4 network cards on the server. There is something wrong with the hardware of the three network cards. Such an evil thing has happened to me. It seems that I can buy × ×. However, I have to say that the software of vmware is reliable, the hardware on one server is broken, and the virtual network distributed on the other servers is still working.

The rest of the work was simple. I picked up the phone and asked the IT engineer to replace the network card. The problem was solved, and I started messing around in my territory again.

I hope my thinking process of fault analysis, troubleshooting and solution can be helpful to everyone.

About the author: Sam Zhao,EUC Solutions Department Manager. He has 15 years of IT experience in software development, testing, project management, customer project implementation and Technical marketing, and has published seven patents and a co-author.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report