In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >
Share
Shulou(Shulou.com)06/01 Report--
The network topology is as follows:
Do IRF virtualization between two operator access switches (that is, virtualize two switches into one switch), and do VRRP hot backup between two load balancers.
The network structure is a big second layer, and the gateways of each link are with the operator.
5800-2 port g2Compact 0Compact 11 is connected to notebook 223.1.5.41
The 5800-1 port g1amp 0Compact 3 is connected to the mobile ISP (gateway) 223.1.5.1
Question:
A large number of users reported that the server ping Mobile Gateway (223.1.5.1) lost packets, the connection was often dropped, and the network was very unstable.
Now that the problem has arisen, we have to find the fault from the nearest network node. First, a notebook is configured with mobile IP (223.1.5.41) to connect the ping mobile gateway to S5800-1; it indicates that the optical fiber link from the mobile operator is normal.
Then connect this notebook to S5800-2, the ping mobile gateway lost the packet, and the server under the ping is normal. It shows that the problem is the data packet loss between S5800-2 and S5800-1, and there is only one pair of optical fibers for IRF between the two S5800s. This may be the problem, so I replaced the optical fiber and fiber module of IRF. The incredible phenomenon has come, and the problem remains.
C:\ Users\ Administrator > ping 223.1.5.1-t
Ping 223.1.5.1 has 32 bytes of data:
Reply from 223.1.5.1: byte = 32 time = 1ms TTL=254
Request timed out.
Request timed out.
Request timed out.
Reply from 223.1.5.1: byte = 32 time = 1ms TTL=254
Request timed out.
C:\ Users\ Administrator > arp-a
API: 223.1.5.2-0xb
Internet address physical address type
223.1.5.1 00-00-5e-00-01-65 dynamic
223.1.5.41 00-22-15-4c-5d-42 dynamic
This. It's impossible. The result of building a mathematical model according to known conditions is unique, and this kind of logical error will not occur. There is only one pair of optical fibers for IRF between the two S5800s, and data can only be transmitted through this pair of IRF fibers. If there is nothing wrong with the optical fiber and the optical fiber module, it only means that some of the data was lost inside the switch after the data was transmitted to S5800-1 through IRF optical fiber.
good! Let's do a traffic statistics to verify this situation:
IP address of telnet 10.10.10.12\\ S5800
Sys
Acl number 3876
Rule permit ip source 223.1.5.41 0destination 223.1.5.1 0
Rule permit ip source 223.1.5.1 0destination 223.1.5.41 0
Quit
Traffic classifier aaa
If-match acl 3876
Quit
Traffic behavior aaa
Accounting packet
Quit
Qos policy aaa
Classifier aaa behavior aaa
Quit
Interface GigabitEthernet 2-0-11
Qos apply policy aaa inbound
Qos apply policy aaa outbound
Quit
Interface GigabitEthernet1/0/3
Qos apply policy aaa inbound
Qos apply policy aaa outbound
Quit
Test: use notebook 223.1.5.41 ping223.1.5.1-n 100\\ to ping 100 packets, only 64 packets were received.
[5800] display qos policy interfaceGigabitEthernet 2-0-11
Interface: GigabitEthernet2/0/11
Direction: Inbound
Policy: aaa
Classifier: aaa
Operator: AND
Rule (s): If-match acl 3876
Behavior: aaa
Accounting Enable:
100 (Packets)
Direction: Outbound
Policy: aaa
Classifier: aaa
Operator: AND
Rule (s): If-match acl 3876
Behavior: aaa
Accounting Enable:
64 (Packets)
[5800] display qos policy interfaceGigabitEthernet 1-0-3
Interface: GigabitEthernet1/0/3
Direction: Inbound
Policy: aaa
Classifier: aaa
Operator: AND
Rule (s): If-match acl 3876
Behavior: aaa
Accounting Enable:
64 (Packets)
Direction: Outbound
Policy: aaa
Classifier: aaa
Operator: AND
Rule (s): If-match acl 3876
Behavior: aaa
Accounting Enable:
64 (Packets)
It shows that the packet is lost inside the switch, that is, 100 packets are sent from the 11 inbound port of g2swap in S5800-2 to 64 packets in the direction of outbound port 3 of g1max in S5800-1. So where are the remaining 36 packets? Is it really lost inside the 5800-1 switch? good! Let me take you to the inside of the switch to see where the 36 missing packets have gone.
[5800-1] en_diag\\ enters hidden mode
[5800-1] debug port mapping 1\\ shows that the port corresponds to the internal port
[Interface] [Unit] [Port] [Name] [Combo?] [Active?] [IfIndex] [MID] [Link] [Attr]
=
GE1/0/1 0 3 ge2 no no 0x900000 4 down Bridge
GE1/0/2 0 2 ge1 no no 0x900001 4 down Bridge
GE1/0/3 0 5 ge4 no no 0x900002 4 up Bridge
..
..
XGE1/0/25 0 26 xe0 no no 0xbc0018 4 up Bridge
XGE1/0/26 0 27 xe1 no no 0xbc0019 4 up Bridge
XGE1/0/27 0 28 xe2 no no 0xbc001a 4 up Bridge
XGE1/0/28 0 29 hg0 no no 0xbc001b 4 up Bridge
Here, we can see that the internal port of the switch port 5 is the g1UniUniverse 3 port, and the internal port port 27 of the switch is the XGE1/0/26 port
Since packet forwarding on layer 2 switches is only related to MAC addresses, let's take a look at where the mobile gateway MAC addresses 0x00005e000165 go. (you'd better learn the principle of packet forwarding process of layer 2 switch first.)
[5800-diagnose] bcm 1 0l2/conflict/mac=0x00005e000165/vlan=5
(slot1) (layer 2 / conflict / mac/vlan)
Conflict: mac=00:00:5e:00:01:65 vlan=5modid=4 port=5/ge4 SDHit Group=Learnt
[5800-diagnose] bcm 1 0l2/conflict/mac=0x00005e000165/vlan=5
Conflict: mac=00:00:5e:00:01:65 vlan=5modid=4 port=5/ge4 SDHit Group=Learnt
[5800-diagnose] bcm 2 0l2/conflict/mac=0x00005e000165/vlan=5
(slot2) (layer 2 / conflict / mac/vlan)
Conflict: mac=00:00:5e:00:01:65 vlan=5modid=4 port=5 SDHit Group=Learnt
[5800-diagnose] bcm 2 0l2/conflict/mac=0x00005e000165/vlan=5
Conflict: mac=00:00:5e:00:01:65 vlan=5modid=4 port=27 SDHit Group=Learnt
Note: a total of 4 tests have been made. The first two tests are in slot1, that is, s5800-1. The MAC address has no drift all the time in port=5.
The last two times are in s5800-2, the MAC address is drifting, one is port=5, and the other is port=27.
Port=5 (XGE1/0/26) port=27 (XGE1/0/26) indicates that mac=0x00005e000165 appears in S5800-2 at port G1UniAccord 3 (connected to mobile gateway) and XGE1/0/26 port (connected to load balancer-1 device).
How did mac=0x00005e000165 appear on the load balancer-1 device? Is it possible that the 36 packets lost went to the load balancer-1 device?
Log in to the load balancer-1 device and find that the virtual MAC address of a group of VRRP (VRID=101) is really mac=0x00005e000165, which is the same as the MAC address of the mobile gateway. What is puzzling is that the configuration of the load balancer device has not been changed for a year. Is it difficult for mobile operators to change the MAC address?
In order not to affect the business, bind the MAC address of the mobile gateway immediately.
Solution:
Bind the MAC of the mobile gateway 223.1.5.1 to the port g1UniUniverse 0max 3.
Telnet 10.10.10.12\\ Log in to S5800
Interface GigabitEthernet1/0/3
Mac-address static 0000-5e00-0165 vlan 5
Call the mobile operator to know: the previous night, the mobile operator added another bras device in the computer room and also made the active and standby VRRP,VRID happened to be 101. when the VRRP was established, the MAC address was not random, but unified from VRID 101MAC=0000-5e00-0165, VRID 102MAC=0000-5e00-0166. . and so on.
Neither the BRAS device nor the load balancer device has the option of vrrp method real-mac to obtain the real interface MAC address, which leads to MAC address conflict. ..
Although many devices have VRRP hot backup function, they do not configure or support the real MAC address feature.
Careful friends may have discovered that this is a loophole caused by VRRP that can affect large-scale network failures.
This article uses some switch debugging and configuration commands, which are difficult to find on the Internet, such as the configuration method of traffic statistics, H3C hidden mode debugging commands and so on.
My purpose of writing this article is to explain to my friends a method of network troubleshooting, that is, the result of establishing a mathematical model according to known conditions is unique, and the illogical problem is due to errors in the given known conditions!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.