In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
What this article shares with you is about how to realize the analysis of the loop fault of the switch. the editor thinks it is very practical, so I share it with you. I hope you can get something after reading this article. Let's take a look at it with the editor.
The paralysis or interruption of the network caused by the network loop is a problem often encountered in the operation and maintenance of our data center, but in the large-scale network environment, this fault often has a strong concealment, so that we can not deal with it quickly and efficiently.
The service desk suddenly received a call saying that multiple business systems could not access the virtual machine. When we arrived in the company's computer room, we found that the network equipment traffic was abnormal, many ESXI in the VC console were suddenly disconnected, SMS alarm messages were frequently sent, and some VC and EXSI devices were out of control.
Fault description
Log in to the aggregation layer switch and find that some ports have abnormal traffic and reported gateway address conflicts.
Troubleshooting
Through the above inspection, the preliminary fault occurs in the network element part. By checking the port registration, we found that the equipment in question was mainly concentrated on the network element switch 2 Huawei S3952, so we went to the No. 7 computer room and found that there was something wrong with the cascade port of Cisco 2960, the network element summary switch connected to it. Now the point of failure has been found, and it is preliminarily believed that it may be a minor problem such as the broken network cable or the fake death of the port. So reseat the network cable, but the fault is still the same, re-change a network cable, still can not solve the fault.
Connect the laptop with the two cascade ports with a network cable, and it is found that the cascade port of the Cisco switch has always been Down, while the cascade port of Huawei switch can be normal UP. It is suspected that there is something wrong with the cascade port of Cisco, so an empty port is found on the Cisco switch and the data is made. This time, the two switches are connected and the port is available. I thought it would solve the problem, but it didn't take long for the port to drop Down again.
Next, the network cable is changed back to the original interface, and the interface is operated by shut and no shut. The port is available, which proves that there is no problem with the port. Viewing the log information of the Cisco switch through the Console port indicates that there is a loop in the network. Another careful observation of the switch port status, there is a strobe phenomenon, Huawei S3952 switch CPU occupancy rate reached 100% immediately Down of the switch port.
Fault handling
The point of failure is finally found, and then the operation is simple. By looking at the traffic of each port of the 3952 switch, it is found that there is an anomaly in the traffic of ports 14 and 15, which is connected to the IMS 3328 switch. So the two ports shut off, and the network returns to normal after shutting down. Confirmed by the computer room staff, BSC and CE are normal, RNC equipment is not working. Looking at the RNC switch 3560 in the second floor computer room, it was found that the cascade port Down had fallen off. As a result of this previous experience, and the network loop has been eliminated, the failure is completely cleared by restarting the port.
Malfunction analysis
How on earth did the loop come into being? After the investigation, it was found that the engineers connected the network cable to the switch without authorization after releasing the line for the new AC equipment. However, AC devices have not yet configured data such as VRRP. The heartbeat line between the AC and the switch forms a loop at layer 2, while switches such as Huawei do not turn on loopback-detection and are controlled by default, resulting in this failure.
After consulting the relevant information, it is found that the Cisco switch enables error detection (including loop detection) in the default state, and the port is automatically closed when a loop is detected in the network. In this failure, it is precisely because the Cisco switch collected by the network elements closed the resulting loop interface in time, which did not affect the core network element equipment such as MSC, MGW, HLR, etc. However, when the network loop is eliminated, the blocked port cannot be opened automatically, and the port needs to be restarted manually.
Experience summary
Nowadays, redundant backup design is widely used in the network for the sake of security and stability, but improper operation can easily cause loops. So how to avoid the occurrence of the network loop and troubleshoot the loop fault quickly and efficiently? There are mainly the following four points:
⒈ enables the loop detection function of the switch.
In general, regular switches support loopback detection of ports, but on some models this function is turned off by default and needs to be turned on manually. In this fault, if the 3328 switch turns on the loop detection, only the AC equipment under the IMS switch will be affected, and the BSC, RNC and other important network elements will not be disturbed.
⒉ shuts down all the ports that the switch does not use temporarily, and configures the password to the console port.
In this way, it can not only improve the security of the network, but also avoid misoperation.
⒊ handling should follow the bottom-up steps to troubleshoot.
Start with the physical layer, then the data link layer, and so on. Pay special attention to the use of log information, related materials and other network tools, etc., and remember not to be too obsessed with experience, sometimes experience will lead you astray.
⒋ "three points of technology, seven points of management", strengthen the management of the computer room.
Engineers entering the computer room must strictly implement the declaration and examination and approval system before construction; during construction, they must be accompanied by special personnel, and make good protective measures and emergency plans.
Add:
Network optimization after breakage
1. Deploy a broken protocol
If the current loop problem is due to the introduction of the physical loop and there is no configuration of the broken protocol, please deploy the broken protocol reasonably according to the network plan. The common broken protocols of Ethernet switches are STP/RSTP/MSTP, RRPP, SEP and so on.
two。 Improve link quality and reliability
If the current loop problem is due to the unreliable quality of the physical link and there is a timeout temporary loop caused by the loss of protocol message congestion, check the link and replace the fiber optic module. If the current problem leads to the discarding of protocol packets due to insufficient bandwidth, it is necessary to expand the bandwidth or use aggregation links (4 uplinks are not bundled, one aggregation switch for every 2) to improve the reliability of the link.
3. Deploy broadcast suppression to improve network robustness
In order to avoid forming a loop again and introduce a broadcast storm again, it is recommended to deploy broadcast suppression under the device port on the ring. According to experience, deploying 1% broadcast suppression can well prevent broadcast storms.
4. Deploy QoS to ensure that protocol messages are forwarded priority 5. Optimize network design and improve network reliability
Complex networking can be controlled by layers. It is suggested that the access layer and convergence layer should be planned and designed reasonably. When there are a large number of devices in a single-layer network, it is suggested that different domains should be divided according to logical organization and geographical distribution.
6. Do the MAC address binding of the port, and the virtual environment drift will be troublesome. 7. Raising the priority of the gateway port is how to realize the switch loop fault analysis. Xiaobian believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.